missing database index

One Missing Index Caused Our Database to Timeout Under Load

missing database index

The error started appearing at 2:14 PM. Not gradually. Not as a warning. Just sudden, complete failure.

SQLSTATE[HY000]: General error: 2006 MySQL server has gone away

Within minutes, our entire application was effectively down. Users couldn’t log in. Existing sessions were timing out. Pages that loaded in 200ms were now taking 30+ seconds before erroring out.

Our monitoring dashboard looked like a heart attack:

  • Database connections: 150/150 (maxed out)
  • Average query time: 18.7 seconds
  • Connection pool wait time: 45 seconds
  • Active transactions: 147 (all hanging)

I checked our traffic. We’d hit 500 concurrent users. Not even close to our peak capacity of 2,000. This shouldn’t be happening.

I pulled up the slow query log and found one query running over and over again. The same query. Hundreds of instances. Each taking 15-20 seconds, holding open a database connection. Each blocking other requests.

SELECT * FROM orders 
WHERE user_id = 12847 
AND status IN ('pending', 'processing') 
ORDER BY created_at DESC 
LIMIT 10;

This query looked innocent. We ran it on every user’s dashboard. It should be fast. We’d tested it in development—it was fast.

But in production, with 2.3 million orders in the table it was scanning the entire table. Every. Single. Time.

The fix took 3 seconds to deploy:

CREATE INDEX idx_orders_user_status_created 
ON orders(user_id, status, created_at);

Query time went from 18 seconds to 0.004 seconds. Crisis over.

But the real question wasn’t “how do we fix this?” It was “how the hell did we ship this to production?”

The Setup: A Perfectly Normal Query

Our orders dashboard was straightforward. Show the user their active orders, most recent first:

class DashboardController extends Controller
{
    public function index()
    {
        $user = auth()->user();
        
        $orders = Order::where('user_id', $user->id)
                      ->whereIn('status', ['pending', 'processing'])
                      ->orderBy('created_at', 'desc')
                      ->limit(10)
                      ->get();
        
        return view('dashboard', compact('orders'));
    }
}

Clean Laravel code. Clear business logic. Efficient limit. This should be fast.

And it was—in development. With 500 test orders in the database, this query ran in 2ms.

Here’s what our database looked like in development:

mysql> SELECT COUNT(*) FROM orders;
+----------+
| COUNT(*) |
+----------+
|      487 |
+----------+

mysql> EXPLAIN SELECT * FROM orders 
    -> WHERE user_id = 1 
    -> AND status IN ('pending', 'processing') 
    -> ORDER BY created_at DESC 
    -> LIMIT 10;

+----+-------------+--------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table  | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+--------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | orders | ALL  | NULL          | NULL | NULL    | NULL |  487 | Using where |
+----+-------------+--------+------+---------------+------+---------+------+------+-------------+

The EXPLAIN showed:

  • type: ALL – Table scan (reading every row)
  • rows: 487 – Examining 487 rows
  • key: NULL – Not using any index

But scanning 487 rows is instant. We saw 2ms query times. Everything worked. We shipped it.

Three months later, here’s what production looked like:

mysql> SELECT COUNT(*) FROM orders;
+----------+
| COUNT(*) |
+----------+
| 2,347,921|
+----------+

mysql> EXPLAIN SELECT * FROM orders 
    -> WHERE user_id = 12847 
    -> AND status IN ('pending', 'processing') 
    -> ORDER BY created_at DESC 
    -> LIMIT 10;

+----+-------------+--------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table  | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
+----+-------------+--------+------+---------------+------+---------+------+---------+-------------+
|  1 | SIMPLE      | orders | ALL  | NULL          | NULL | NULL    | NULL | 2347921 | Using where |
+----+-------------+--------+------+---------------+------+---------+------+---------+-------------+

Same query. Same type: ALL. But now examining 2.3 MILLION rows to find 10 results.

At 500 concurrent users, that’s 500 full table scans happening simultaneously. Each taking 15-20 seconds, holding a connection. Each blocking other queries.

The database didn’t stand a chance.

The Investigation: Following the Slow Query Log

I enabled MySQL’s slow query log (it wasn’t on—mistake #1):

-- In MySQL config
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1;  -- Log queries > 1 second
SET GLOBAL slow_query_log_file = '/var/log/mysql/slow-queries.log';

Within seconds, the log was filling up:

# Time: 2024-01-15T14:15:32.847291Z
# User@Host: app_user[app_user] @ localhost []
# Query_time: 18.473922  Lock_time: 0.000143  Rows_sent: 8  Rows_examined: 2347921
SELECT * FROM orders WHERE user_id = 12847 AND status IN ('pending', 'processing') 
ORDER BY created_at DESC LIMIT 10;

# Time: 2024-01-15T14:15:33.193847Z
# User@Host: app_user[app_user] @ localhost []
# Query_time: 17.829381  Lock_time: 0.000091  Rows_sent: 3  Rows_examined: 2347921
SELECT * FROM orders WHERE user_id = 8392 AND status IN ('pending', 'processing') 
ORDER BY created_at DESC LIMIT 10;

# Time: 2024-01-15T14:15:33.584920Z
# User@Host: app_user[app_user] @ localhost []
# Query_time: 19.102847  Lock_time: 0.000182  Rows_sent: 5  Rows_examined: 2347921
SELECT * FROM orders WHERE user_id = 15673 AND status IN ('pending', 'processing') 
ORDER BY created_at DESC LIMIT 10;

The same query. Over and over. Different users, but the same pattern:

  • Rows_sent: 3-10 (what we wanted)
  • Rows_examined: 2,347,921 (scanning the entire table)
  • Query_time: 15-20 seconds each

I ran SHOW PROCESSLIST to see what was happening in real-time:

mysql> SHOW FULL PROCESSLIST;

+-----+----------+-----------+--------+---------+------+----------+------------------+
| Id  | User     | Host      | db     | Command | Time | State    | Info             |
+-----+----------+-----------+--------+---------+------+----------+------------------+
| 347 | app_user | localhost | app_db | Query   | 17   | Sending  | SELECT * FROM... |
| 348 | app_user | localhost | app_db | Query   | 16   | Sending  | SELECT * FROM... |
| 349 | app_user | localhost | app_db | Query   | 15   | Sending  | SELECT * FROM... |
| 350 | app_user | localhost | app_db | Query   | 14   | Sending  | SELECT * FROM... |
... (145 more rows)

147 connections. All running the same query. All in “Sending data” state (which actually means “reading and processing rows”).

Our connection pool was exhausted. New requests couldn’t get a connection. Existing requests were timing out.

The Root Cause: No Composite Index

Let me show you what indexes we actually had:

mysql> SHOW INDEXES FROM orders;

+--------+------------+-------------+--------------+-------------+
| Table  | Non_unique | Key_name    | Seq_in_index | Column_name |
+--------+------------+-------------+--------------+-------------+
| orders |          0 | PRIMARY     |            1 | id          |
| orders |          1 | idx_user_id |            1 | user_id     |
+--------+------------+-------------+--------------+-------------+

We had:

  • Primary key on id
  • Index on user_id

But we were querying:

  • WHERE user_id = X ✓ (covered by index)
  • AND status IN (...) ✗ (not covered)
  • ORDER BY created_at DESC ✗ (not covered)

MySQL could use the user_id index to narrow down to ~800 orders per user. But then it had to:

  1. Load all 800 orders into memory
  2. Filter by status (scanning all 800)
  3. Sort by created_at (sorting all matching rows)
  4. Take the first 10

With enough concurrent users, this overwhelmed the database.

Why Development Didn’t Catch This

This is where it gets interesting. We had several safety mechanisms that all failed:

Failure #1: Test Data Didn’t Reflect Production

Development database:

  • 3 users
  • 487 total orders
  • Average 162 orders per user

Production database:

  • 45,000 users
  • 2,347,921 orders
  • Average 52 orders per user
  • But some power users had 15,000+ orders

Our test data didn’t capture the distribution. The “whale” users—the ones with thousands of orders—broke the query.

Failure #2: We Didn’t EXPLAIN In Development

We never ran EXPLAIN on our queries during development. We just tested them and said “works fine, ship it.”

If we’d run EXPLAIN, even on our tiny test database, we would have seen:

  • type: ALL (table scan)
  • key: NULL (no index used)

These are red flags that scream “this won’t scale.”

Failure #3: No Load Testing

We tested the feature manually. Clicked around. Looked good. Shipped.

We never tested with:

  • Realistic data volumes
  • Concurrent users
  • Production-like traffic patterns

Failure #4: Slow Query Log Wasn’t Enabled

In production, slow query logging was off. We never saw the warnings that MySQL was trying to give us.

Had we enabled it from day one, we would have noticed queries slowing down as data grew.

The Fix: The Right Composite Index

The solution was a composite index that covered all our query conditions:

CREATE INDEX idx_orders_user_status_created 
ON orders(user_id, status, created_at);

Let me break down why this specific order matters:

Column Order in Composite Indexes

Indexes are like a phone book. If you want to look up “John Smith,” you need:

  1. Last name (Smith) first
  2. First name (John) second

A phone book sorted by first name would be useless for finding “Smith.”

Same with database indexes. Our query does:

  1. Filter by user_id (most selective)
  2. Filter by status (somewhat selective)
  3. Order by created_at (sorting)

So our index needs columns in that exact order:

-- GOOD: Matches query order
CREATE INDEX idx_orders_user_status_created 
ON orders(user_id, status, created_at);

-- BAD: Wrong order, won't be used efficiently
CREATE INDEX idx_orders_created_status_user 
ON orders(created_at, status, user_id);

Let me show you the EXPLAIN after adding the index:

mysql> EXPLAIN SELECT * FROM orders 
    -> WHERE user_id = 12847 
    -> AND status IN ('pending', 'processing') 
    -> ORDER BY created_at DESC 
    -> LIMIT 10;

+----+-------------+--------+-------+------------------------------+------------------------------+
| id | select_type | table  | type  | possible_keys                | key                          |
+----+-------------+--------+-------+------------------------------+------------------------------+
|  1 | SIMPLE      | orders | range | idx_orders_user_status_created| idx_orders_user_status_created|
+----+-------------+--------+-------+------------------------------+------------------------------+

+---------+------+------+------------------------------------+
| key_len | ref  | rows | Extra                              |
+---------+------+------+------------------------------------+
| 14      | NULL | 8    | Using where; Using index condition |
+---------+------+------+------------------------------------+

Now:

  • type: range (using index for range scan) ✓
  • key: idx_orders_user_status_created (using our index) ✓
  • rows: 8 (examining only 8 rows instead of 2.3M) ✓
  • Extra: Using index condition (optimal index usage) ✓

Query time went from 18 seconds to 4 milliseconds.

The Deploy: 3 Seconds to Production

Creating an index on a large table in production is scary. It can:

  • Lock the table during creation
  • Take hours on large tables
  • Block writes
  • Cause downtime

But MySQL 5.6+ and PostgreSQL support online index creation:

-- MySQL
CREATE INDEX idx_orders_user_status_created 
ON orders(user_id, status, created_at) 
ALGORITHM=INPLACE, LOCK=NONE;

-- PostgreSQL
CREATE INDEX CONCURRENTLY idx_orders_user_status_created 
ON orders(user_id, status, created_at);

These commands:

  • Don’t lock the table
  • Allow reads and writes during creation
  • Take longer but don’t cause downtime

On our 2.3M row table, index creation took 47 seconds. Zero downtime.

The moment it completed:

  • Query time: 18s → 0.004s
  • Database connections: 150/150 → 12/150
  • Connection pool wait: 45s → 0s
  • Application response time: 30s → 0.2s

Within 60 seconds, everything was back to normal.

The Other Missing Indexes We Found

After this incident, I audited our entire schema. Found 14 more queries with the same problem.

Example 1: User Activity Feed

// Query
$activities = Activity::where('user_id', $user->id)
                      ->where('created_at', '>=', now()->subDays(30))
                      ->orderBy('created_at', 'desc')
                      ->get();
-- Missing index
CREATE INDEX idx_activity_user_created 
ON activities(user_id, created_at);

Before: 8.2s, After: 0.003s

Example 2: Invoice Search

// Query
$invoices = Invoice::where('company_id', $company->id)
                   ->where('status', 'unpaid')
                   ->where('due_date', '<', now())
                   ->orderBy('due_date', 'asc')
                   ->get();
-- Missing index
CREATE INDEX idx_invoices_company_status_due 
ON invoices(company_id, status, due_date);

Before: 12.4s, After: 0.005s

Example 3: Product Search with Category

// Query
$products = Product::where('category_id', $category->id)
                   ->where('status', 'active')
                   ->where('stock', '>', 0)
                   ->orderBy('name')
                   ->paginate(20);
-- Missing index
CREATE INDEX idx_products_category_status_stock_name 
ON products(category_id, status, stock, name);

Before: 5.7s, After: 0.002s

The pattern was the same everywhere:

  • Multiple WHERE conditions
  • ORDER BY clause
  • No composite index covering all columns
  • Fast in development, slow in production

The Tools We Should Have Been Using

1. Laravel Debugbar

First thing I did after this incident was install Debugbar in development:

composer require barryvdh/laravel-debugbar --dev

Now every query shows:

  • Execution time
  • Whether indexes were used
  • Number of rows examined
  • Duplicate queries
  • N+1 query detection

If we’d had this from day one, we would have seen the warnings.

2. EXPLAIN in Development

I created a helper function for development:

// In app/Helpers/DatabaseHelper.php

function explain($query)
{
    if (!app()->environment('local')) {
        return;
    }
    
    $sql = $query->toSql();
    $bindings = $query->getBindings();
    
    // Replace ? with actual values for EXPLAIN
    foreach ($bindings as $binding) {
        $value = is_numeric($binding) ? $binding : "'$binding'";
        $sql = preg_replace('/\?/', $value, $sql, 1);
    }
    
    $explain = DB::select("EXPLAIN $sql");
    
    dump([
        'sql' => $sql,
        'explain' => $explain
    ]);
}

// Usage
$query = Order::where('user_id', $user->id)
              ->whereIn('status', ['pending', 'processing'])
              ->orderBy('created_at', 'desc');

explain($query);  // Dumps EXPLAIN output

$orders = $query->limit(10)->get();

This makes it trivial to check query plans during development.

3. Slow Query Log Analysis

I wrote a script to parse slow query logs and find patterns:

#!/bin/bash
# analyze-slow-queries.sh

SLOW_LOG="/var/log/mysql/slow-queries.log"
TEMP_FILE="/tmp/slow-queries-analyzed.txt"

echo "Analyzing slow queries..."

# Extract unique query patterns (removing specific values)
grep "SELECT" $SLOW_LOG | \
    sed -e 's/[0-9]\{1,\}/N/g' \
        -e "s/'[^']*'/'X'/g" | \
    sort | uniq -c | sort -rn > $TEMP_FILE

echo "Top 10 slow query patterns:"
head -20 $TEMP_FILE

echo ""
echo "Full analysis saved to: $TEMP_FILE"

This shows which query patterns are slow most often.

4. Index Suggestion Tool

Created an artisan command to suggest missing indexes:

<?php

namespace App\Console\Commands;

use Illuminate\Console\Command;
use Illuminate\Support\Facades\DB;

class SuggestIndexes extends Command
{
    protected $signature = 'db:suggest-indexes';
    protected $description = 'Analyze queries and suggest missing indexes';

    public function handle()
    {
        $this->info('Analyzing recent queries...');
        
        // Get slow queries from MySQL
        $slowQueries = DB::select("
            SELECT sql_text, 
                   count_star as executions,
                   avg_timer_wait/1000000000000 as avg_time_ms,
                   sum_rows_examined as total_rows_examined
            FROM performance_schema.events_statements_summary_by_digest
            WHERE avg_timer_wait > 1000000000000  -- > 1 second
            ORDER BY sum_timer_wait DESC
            LIMIT 20
        ");
        
        foreach ($slowQueries as $query) {
            $this->warn("Query: " . substr($query->sql_text, 0, 100) . "...");
            $this->line("Executions: {$query->executions}");
            $this->line("Avg time: " . round($query->avg_time_ms, 2) . "ms");
            $this->line("Rows examined: {$query->total_rows_examined}");
            
            // Suggest potential indexes
            $this->suggestIndexes($query->sql_text);
            $this->line('');
        }
    }
    
    protected function suggestIndexes($sql)
    {
        // Extract table name
        preg_match('/FROM\s+`?(\w+)`?/i', $sql, $table);
        
        // Extract WHERE conditions
        preg_match_all('/WHERE\s+.*?`?(\w+)`?\s*[=<>]/i', $sql, $whereColumns);
        
        // Extract ORDER BY
        preg_match('/ORDER\s+BY\s+`?(\w+)`?/i', $sql, $orderColumn);
        
        if (!empty($table[1])) {
            $columns = array_unique($whereColumns[1] ?? []);
            if (!empty($orderColumn[1])) {
                $columns[] = $orderColumn[1];
            }
            
            if (count($columns) > 1) {
                $this->comment("Suggested index: CREATE INDEX idx_{$table[1]}_" . 
                             implode('_', $columns) . 
                             " ON {$table[1]}(" . implode(', ', $columns) . ");");
            }
        }
    }
}

Run this weekly to catch new slow queries:

php artisan db:suggest-indexes

The Prevention Strategy

We implemented several safeguards to prevent this from happening again:

1. Load Testing in CI/CD

Added a load testing stage to our deployment pipeline:

# .github/workflows/deploy.yml

- name: Load Test
  run: |
    # Seed database with production-like data
    php artisan db:seed --class=LoadTestSeeder
    
    # Run load test with k6
    k6 run tests/load/api-endpoints.js
    
    # Check for slow queries
    php artisan db:suggest-indexes

2. Pre-deployment Query Analysis

Before any database migration:

# Required checklist:
□ Run EXPLAIN on all new queries
□ Check if composite indexes are needed
□ Test with production-like data volume
□ Review slow query log after testing
□ Document expected query performance

3. Automated Index Monitoring

Created a monitoring job that runs daily:

$schedule->command('db:check-query-performance')
         ->dailyAt('03:00')
         ->onFailure(function () {
             $this->notifySlack('Slow queries detected!');
         });

This command:

  1. Checks slow query log
  2. Identifies queries >1s
  3. Suggests indexes
  4. Alerts team if issues found

4. Production Data Sampling for Dev

Set up a weekly job to sample production data (anonymized) into staging:

// Copies 10% of production data to staging
// Maintains data distribution (including "whale" users)
// Anonymizes PII
php artisan db:sample-production --percentage=10

Now our staging environment reflects real-world data distribution.

The Performance Impact

After adding the 15 missing indexes we found:

Overall application performance:

  • Average response time: 847ms → 203ms
  • 95th percentile: 3,200ms → 450ms
  • Database CPU: 78% → 23%
  • Connection pool usage: Constant saturation → healthy
  • Timeout errors: 45/hour → 0

Specific improvements:

  • Dashboard load: 18s → 0.2s
  • Search queries: 12s → 0.3s
  • Report generation: 45s → 2s
  • API endpoints: 50% faster across the board

Infrastructure costs:

  • Downgraded from db.r5.2xlarge to db.r5.large
  • Saved $650/month in RDS costs
  • Reduced read replicas from 3 to 1

The missing indexes weren’t just causing slow queries—they were costing us real money.

The Hard Lessons

1. Test Data Must Match Production Distribution

Don’t just test with average cases. Test with:

  • Power users (10x normal usage)
  • Edge cases (empty states, maximum records)
  • Realistic data volumes

2. EXPLAIN Every Query Before Production

If you can’t explain (literally) how a query works, don’t ship it.

-- If EXPLAIN shows any of these, fix before deploying:
type: ALL              -- Full table scan
rows: [large number]   -- Scanning too many rows
key: NULL              -- Not using any index
Extra: Using filesort  -- Expensive sorting operation

3. Slow Query Log Is Not Optional

Enable it in all environments:

-- MySQL
slow_query_log = 1
long_query_time = 1
log_queries_not_using_indexes = 1

-- PostgreSQL
log_min_duration_statement = 1000

4. Composite Indexes Are Not Intuitive

Single-column indexes seem “cleaner” but multi-column queries need multi-column indexes.

Rule: If your WHERE clause has multiple conditions, you probably need a composite index.

5. Production Will Surprise You

What works with 500 rows will break with 500,000 rows. What works with 1 concurrent user will break with 100.

Always load test. Always.

The Index Design Checklist

When creating an index, verify:

□ Column order matches query filter order
□ Most selective column is first
□ Covers all WHERE conditions when possible
□ Covers ORDER BY column when possible
□ Tested with production-like data volume
□ Verified with EXPLAIN
□ Considered index size (disk space)
□ Documented why index exists

The Bottom Line

One missing index brought our entire application down. Not a complex architectural problem. Not a scaling issue. Just a missing index.

The fix took 47 seconds to deploy. The investigation took 2 hours. The months of poor performance we inflicted on our users? Inexcusable.

We got lucky. This happened during moderate traffic, and we caught it quickly. If this had hit during peak hours with 2,000+ concurrent users, the database would have been completely unrecoverable without downtime.

Check your indexes. Right now. Run EXPLAIN on your most common queries. I bet you’ll find at least one that’s doing a full table scan.

And when you do, remember—it’s not a matter of if it will cause problems in production. It’s when.


Leave a Comment

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *