A 10-Line Code Change That Eliminated Random Production Failures

A 10-Line Code Change That Eliminated Random Production Failures

Laravel Job Deadlocks

It was 2:47 AM when my phone buzzed with the fourth Sentry alert of the night. Same error, mysterious stack trace. Same “I have no idea what’s causing this” feeling.

Error: SQLSTATE[40001]: Serialization failure: 1213 Deadlock found
when trying to get lock; try restarting transaction

This had been happening for three weeks. Not constantly—just random enough to be infuriating. Sometimes twice a day. Sometimes not for 72 hours. Always during high traffic periods. Always unpredictable.

Our users were seeing failed orders, incomplete profile updates, and stuck payment processing. Our support team was drowning in tickets. And I was getting very familiar with 3 AM coffee.

The fix, when I finally found it, was 10 lines of code. Not a fancy algorithm. Not a complex refactor. Just 10 lines that I should have written six months ago when I first implemented our job queue system.

Let me show you how one innocent-looking Laravel job brought our entire production system to its knees.

The Setup: A Normal Laravel Job

Our application processed orders through a background job. Standard Laravel queue stuff—nothing fancy:

<?php

namespace App\Jobs;

use App\Models\Order;
use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Queue\SerializesModels;

class ProcessOrder implements ShouldQueue
{
    use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;

    public $order;

    public function __construct(Order $order)
    {
        $this->order = $order;
    }

    public function handle()
    {
        // Update inventory
        foreach ($this->order->items as $item) {
            $item->product->decrement('stock', $item->quantity);
        }

        // Create invoice
        $invoice = $this->order->invoice()->create([
            'total' => $this->order->total,
            'status' => 'pending'
        ]);

        // Process payment
        $payment = PaymentGateway::charge([
            'amount' => $this->order->total,
            'customer' => $this->order->user_id
        ]);

        if ($payment->successful()) {
            $this->order->update(['status' => 'completed']);
            $invoice->update(['status' => 'paid']);
        } else {
            $this->order->update(['status' => 'failed']);
        }
    }
}

Clean code. Easy to understand. Followed all the Laravel conventions. Worked perfectly in development.

And it was a ticking time bomb.

The Symptoms: Random Is the Worst Kind of Bug

The failures started small. A few Sentry alerts a week about deadlocks. We’d see errors like:

SQLSTATE[40001]: Serialization failure: 1213 Deadlock found 
when trying to get lock; try restarting transaction

in ProcessOrder.php:32

But here’s the thing—when we looked at the database, the order would be completed. The inventory was decremented. The invoice was created. Everything looked fine.

So we’d mark the Sentry issue as resolved and move on.

Then users started complaining:

  • “I was charged twice for the same order”
  • “My cart says the item is out of stock but I just saw it in stock”
  • “My order shows as pending but my card was charged”

We checked Horizon (Laravel’s queue monitoring dashboard). Jobs were failing and retrying. Sometimes 3 times. Sometimes 8 times. The pattern made no sense.

php artisan horizon:list

# Output showed:
Failed Jobs: 47
Retried Jobs: 231
Average Retry Time: 4.2 seconds

The retries were happening automatically because of Laravel’s default queue behavior. Which seemed fine. Retries are good, right?

Wrong.

The Investigation: Following the Breadcrumbs

I spent a weekend diving into our logs. I enabled query logging in Laravel:

// In AppServiceProvider.php
DB::listen(function($query) {
    Log::info('Query executed', [
        'sql' => $query->sql,
        'bindings' => $query->bindings,
        'time' => $query->time
    ]);
});

Then I triggered a job manually and watched the logs. Here’s what I saw:

[2024-01-15 14:23:01] Job started: ProcessOrder (Order #1234)
[2024-01-15 14:23:01] UPDATE products SET stock = stock - 1 WHERE id = 42
[2024-01-15 14:23:02] INSERT INTO invoices (order_id, total, status) VALUES (1234, 99.99, 'pending')
[2024-01-15 14:23:03] Payment Gateway API call (1.2s)
[2024-01-15 14:23:04] UPDATE orders SET status = 'completed' WHERE id = 1234
[2024-01-15 14:23:04] UPDATE invoices SET status = 'paid' WHERE order_id = 1234
[2024-01-15 14:23:04] Job completed successfully

Looks fine. But then I simulated a payment gateway timeout:

[2024-01-15 14:25:01] Job started: ProcessOrder (Order #1234)
[2024-01-15 14:25:01] UPDATE products SET stock = stock - 1 WHERE id = 42
[2024-01-15 14:25:02] INSERT INTO invoices (order_id, total, status) VALUES (1234, 99.99, 'pending')
[2024-01-15 14:25:03] Payment Gateway API call...
[2024-01-15 14:25:33] GuzzleHttp\Exception\ConnectException: cURL error 28: Timeout
[2024-01-15 14:25:33] Job failed - will retry in 90 seconds

[2024-01-15 14:27:03] Job started: ProcessOrder (Order #1234) [Retry #1]
[2024-01-15 14:27:03] UPDATE products SET stock = stock - 1 WHERE id = 42
[2024-01-15 14:27:04] SQLSTATE[23000]: Integrity constraint violation: Duplicate entry
[2024-01-15 14:27:04] INSERT INTO invoices... FAILED
[2024-01-15 14:27:05] Payment Gateway API call (1.1s)
[2024-01-15 14:27:06] UPDATE orders SET status = 'completed' WHERE id = 1234
[2024-01-15 14:27:06] Job completed successfully (but inconsistent state!)

There it was. The smoking gun.

The job was retrying the entire process, including parts that had already succeeded.

On retry #1:

  • Inventory was decremented AGAIN (user’s product, someone else’s loss)
  • Invoice creation failed (already exists)
  • Payment was charged AGAIN (double charging)
  • Order status updated (looks successful)

The job would complete successfully on the retry, but the data was a mess. Inventory was wrong. Charges were duplicated. And occasionally, multiple retries would try to update the same records simultaneously, causing deadlocks.

The Root Cause: Jobs Aren’t Idempotent

The problem was that our job wasn’t idempotent.

Idempotent means you can run it multiple times and get the same result. Like pressing an elevator button repeatedly—it doesn’t call multiple elevators, just one.

Our job was more like ordering coffee. Every time you say “I’ll have a latte,” you get another latte. Run the job 5 times? Five lattes. Or in our case, five inventory decrements and five charges.

Laravel’s default queue retry behavior assumes your jobs are idempotent. But nothing forces you to make them that way. It’s a footgun waiting to go off.

The Wrong Solutions I Almost Implemented

Before I found the real fix, I tried several wrong approaches:

Attempt 1: Disable Retries

class ProcessOrder implements ShouldQueue
{
    public $tries = 1; // Only try once
}

This stopped the double-processing but meant legitimate failures (network hiccups, temporary database issues) would just fail permanently. Not great.

Attempt 2: Add a Lock

public function handle()
{
    $lockKey = "process_order_{$this->order->id}";
    
    if (Cache::has($lockKey)) {
        return; // Already processing
    }
    
    Cache::put($lockKey, true, 600);
    
    // Process order...
    
    Cache::forget($lockKey);
}

This worked but had race conditions. If two jobs started at the exact same millisecond (which happened under load), both would get through. Plus, if a job crashed, the lock never got released.

Attempt 3: Database Transaction with Locks

public function handle()
{
    DB::transaction(function () {
        $order = Order::lockForUpdate()->find($this->order->id);
        
        if ($order->status !== 'pending') {
            return; // Already processed
        }
        
        // Process order...
    });
}

Getting closer, but still had issues with the payment gateway calls inside transactions. Long-running transactions are a bad idea, and they still deadlocked under high concurrency.

The Actual Solution: Make It Idempotent

The real fix was to track job state properly and make operations truly idempotent. Here’s the 10-line change that fixed everything:

<?php

namespace App\Jobs;

use App\Models\Order;
use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Queue\SerializesModels;

class ProcessOrder implements ShouldQueue
{
    use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;

    public $order;
    
    // Add retry configuration
    public $tries = 3;
    public $backoff = [10, 30, 60]; // Exponential backoff

    public function __construct(Order $order)
    {
        $this->order = $order;
    }

    public function handle()
    {
        // THE FIX: Check if already processed (Line 1)
        if ($this->order->status !== 'pending') {
            return; // Already completed, skip
        }

        // THE FIX: Mark as processing (Lines 2-3)
        $this->order->update(['status' => 'processing']);
        
        try {
            // THE FIX: Make inventory decrement idempotent (Lines 4-6)
            foreach ($this->order->items as $item) {
                if (!$item->inventory_decremented) {
                    $item->product->decrement('stock', $item->quantity);
                    $item->update(['inventory_decremented' => true]);
                }
            }

            // THE FIX: Make invoice creation idempotent (Line 7)
            $invoice = $this->order->invoice ?? $this->order->invoice()->create([
                'total' => $this->order->total,
                'status' => 'pending'
            ]);

            // THE FIX: Make payment idempotent (Lines 8-10)
            if (!$this->order->payment_attempted) {
                $payment = PaymentGateway::charge([
                    'amount' => $this->order->total,
                    'customer' => $this->order->user_id
                ]);

                $this->order->update(['payment_attempted' => true]);

                if ($payment->successful()) {
                    $this->order->update([
                        'status' => 'completed',
                        'payment_id' => $payment->id
                    ]);
                    $invoice->update(['status' => 'paid']);
                } else {
                    $this->order->update(['status' => 'failed']);
                }
            }
        } catch (\Exception $e) {
            // Mark as pending so it can retry properly
            $this->order->update(['status' => 'pending']);
            throw $e; // Re-throw to trigger retry
        }
    }
}

Let me break down what changed:

Lines 1-3: Status Check & Processing State

if ($this->order->status !== 'pending') {
    return; // Already completed, skip
}
$this->order->update(['status' => 'processing']);

If the job retries and the order is already completed, we bail out immediately. The processing state prevents multiple jobs from proceeding simultaneously.

Lines 4-6: Idempotent Inventory

if (!$item->inventory_decremented) {
    $item->product->decrement('stock', $item->quantity);
    $item->update(['inventory_decremented' => true]);
}

We track whether inventory was already decremented. On retry, we skip this step.

Line 7: Idempotent Invoice

$invoice = $this->order->invoice ?? $this->order->invoice()->create([...]);

Check if invoice exists first. Only create if it doesn’t.

Lines 8-10: Idempotent Payment

if (!$this->order->payment_attempted) {
    $payment = PaymentGateway::charge([...]);
    $this->order->update(['payment_attempted' => true]);
}

Only attempt payment once. Track that we tried.

The Database Changes

Of course, this required a small migration to add tracking columns:

<?php

use Illuminate\Database\Migrations\Migration;
use Illuminate\Database\Schema\Blueprint;
use Illuminate\Support\Facades\Schema;

class AddIdempotencyFieldsToOrders extends Migration
{
    public function up()
    {
        Schema::table('orders', function (Blueprint $table) {
            $table->boolean('payment_attempted')->default(false)->after('status');
        });

        Schema::table('order_items', function (Blueprint $table) {
            $table->boolean('inventory_decremented')->default(false)->after('quantity');
        });
    }

    public function down()
    {
        Schema::table('orders', function (Blueprint $table) {
            $table->dropColumn('payment_attempted');
        });

        Schema::table('order_items', function (Blueprint $table) {
            $table->dropColumn('inventory_decremented');
        });
    }
}

Simple boolean flags that cost us nothing in storage but saved us thousands in support time.

The Horizon Configuration

We also tuned our Horizon configuration to handle failures better:

// config/horizon.php

'defaults' => [
    'supervisor-1' => [
        'connection' => 'redis',
        'queue' => ['default'],
        'balance' => 'auto',
        'minProcesses' => 1,
        'maxProcesses' => 10,
        'balanceMaxShift' => 1,
        'balanceCooldown' => 3,
        'tries' => 3, // Match job configuration
        'timeout' => 60,
    ],
],

// Separate queue for order processing
'environments' => [
    'production' => [
        'orders-supervisor' => [
            'connection' => 'redis',
            'queue' => ['orders'],
            'balance' => 'auto',
            'minProcesses' => 2,
            'maxProcesses' => 5,
            'balanceMaxShift' => 1,
            'balanceCooldown' => 3,
            'tries' => 3,
            'timeout' => 90, // Longer timeout for payment API
        ],
    ],
],

And separated critical jobs into their own queue:

// When dispatching
ProcessOrder::dispatch($order)->onQueue('orders');

This ensured order processing jobs couldn’t get stuck behind other less important background tasks.

The Results: From Chaos to Stability

After deploying these changes:

Week 1:

  • Deadlock errors: 47 → 0
  • Failed jobs: 231 → 12 (all legitimate network issues)
  • Duplicate charges: 18 → 0
  • Inventory discrepancies: 34 → 0
  • Support tickets: 89 → 23

Month 1:

  • Average job retry count: 4.2 → 0.8
  • Jobs completing on first try: 62% → 94%
  • 3 AM pages: 12 → 0 (this was the real win)

Quarter 1:

  • Zero deadlocks
  • Zero duplicate charges
  • Zero inventory issues from job retries
  • Customer satisfaction up 24%
  • Support team stopped hating me

What I Learned About Job Queues

1. Idempotency Isn’t Optional

If your job can be retried (and Laravel jobs can), it MUST be idempotent. This isn’t a nice-to-have—it’s a requirement.

Every time you write a job that modifies state, ask yourself: “What happens if this runs twice?”

2. Default Behaviors Are Dangerous

Laravel’s queue system is great, but its defaults assume you know what you’re doing. Automatic retries are helpful until they destroy your data.

3. State Tracking Is Cheap

Adding a few boolean columns to track job progress costs almost nothing. Not adding them cost us weeks of debugging and hundreds of support tickets.

4. Status Transitions Matter

The order of status changes matters:

pending → processing → completed ✓
pending → completed (skip processing)

That intermediate state prevents race conditions.

5. Monitoring Saves Lives

Without Horizon showing us retry patterns, we would have been blind. Install monitoring on day one, not after things break.

The Checklist for Every Job

Now, before any job goes to production, it must pass this test:

// Job Idempotency Checklist
// □ Can this job be safely retried?
// □ Does it check existing state before modifying data?
// □ Does it track what steps have been completed?
// □ Does it handle partial completion gracefully?
// □ Does it have appropriate timeout values?
// □ Does it have exponential backoff configured?
// □ Does it log enough info to debug failures?
// □ Have I tested it with simulated failures?

If you can’t check all these boxes, your job isn’t ready.

Common Job Patterns That Need This Fix

After fixing our order processing, I audited all our jobs. These patterns are everywhere:

Invoice Generation

// Bad: Always creates new invoice
$invoice = Invoice::create([...]);

// Good: Idempotent
$invoice = Invoice::firstOrCreate(
    ['order_id' => $order->id],
    ['total' => $order->total, 'status' => 'pending']
);

Email Sending

// Bad: Sends on every retry
Mail::to($user)->send(new OrderConfirmation($order));

// Good: Tracks if sent
if (!$order->confirmation_email_sent) {
    Mail::to($user)->send(new OrderConfirmation($order));
    $order->update(['confirmation_email_sent' => true]);
}

Webhook Delivery

// Bad: Posts on every retry
Http::post($webhook_url, $data);

// Good: Tracks attempts
if ($order->webhook_deliveries()->where('url', $webhook_url)->doesntExist()) {
    $response = Http::post($webhook_url, $data);
    $order->webhook_deliveries()->create([
        'url' => $webhook_url,
        'status' => $response->status(),
        'response' => $response->body()
    ]);
}

File Processing

// Bad: Processes every time
$image = Image::make($file)->resize(800, 600)->save($path);

// Good: Checks if already processed
if (!Storage::exists($path)) {
    $image = Image::make($file)->resize(800, 600)->save($path);
}

The Debugging Tools That Helped

1. Horizon Dashboard

Laravel Horizon’s UI was invaluable:

php artisan horizon
# Then visit: http://localhost/horizon

Shows:

  • Failed jobs with full stack traces
  • Retry counts per job
  • Job throughput
  • Queue wait times
  • Recent job history

2. Custom Job Middleware

We built middleware to log job execution:

<?php

namespace App\Jobs\Middleware;

class LogJobExecution
{
    public function handle($job, $next)
    {
        $jobName = get_class($job);
        $jobId = $job->job->getJobId();
        
        Log::info("Job starting", [
            'job' => $jobName,
            'id' => $jobId,
            'attempt' => $job->attempts()
        ]);

        $start = microtime(true);

        try {
            $next($job);
            
            Log::info("Job completed", [
                'job' => $jobName,
                'id' => $jobId,
                'duration' => microtime(true) - $start
            ]);
        } catch (\Exception $e) {
            Log::error("Job failed", [
                'job' => $jobName,
                'id' => $jobId,
                'error' => $e->getMessage(),
                'duration' => microtime(true) - $start
            ]);
            
            throw $e;
        }
    }
}

Applied to jobs:

class ProcessOrder implements ShouldQueue
{
    public function middleware()
    {
        return [new LogJobExecution];
    }
}

3. Sentry Context

Enhanced Sentry errors with job context:

public function handle()
{
    \Sentry\configureScope(function ($scope) {
        $scope->setContext('job', [
            'order_id' => $this->order->id,
            'user_id' => $this->order->user_id,
            'attempt' => $this->attempts(),
            'status' => $this->order->status
        ]);
    });

    // Job logic...
}

This made Sentry errors actually useful instead of just showing line numbers.

What About Unique Jobs?

Laravel 8+ has a ShouldBeUnique interface, but it’s not quite the same thing:

class ProcessOrder implements ShouldQueue, ShouldBeUnique
{
    public function uniqueId()
    {
        return $this->order->id;
    }
}

This prevents the same job from being queued twice simultaneously. But it doesn’t make the job idempotent—it just prevents duplicates in the queue.

We still need idempotency because:

  1. Jobs can fail and retry
  2. Jobs can timeout and restart
  3. Race conditions can still occur

Think of ShouldBeUnique as preventing duplicate queue entries, and idempotency as handling retries safely.

The Takeaway

Ten lines of code. That’s all it took to fix three weeks of production chaos.

Not a framework change, architectural rewrite. Not a migration to a new queue system.

Just proper state management and idempotency checks.

The embarrassing part? This is a known problem. Laravel’s documentation mentions it. Queue systems have been around for decades. This isn’t new knowledge.

But it’s easy to skip when you’re moving fast. “It works in dev, ship it.” Until it doesn’t work in production.

Now I treat every queue job like it will be retried—because it will be. And I make damn sure it can handle that gracefully.

Have you been bitten by non-idempotent jobs? What was your debugging nightmare? I’d love to hear I’m not the only one who learned this lesson the hard way.

Leave a Comment

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *