Overcoming Laravel Queue Challenges: Insightful Solutions from Pionect [Daniel’s Tech Thursday]

Today, we’re diving deep into the Laravel queue system, specifically focusing on a few unique challenges we encountered at Pionect. I will be sharing the solutions we implemented in this blog post. The Laravel queue system is an incredibly robust tool and an essential component for enterprise projects, as it ensures all jobs are efficiently processed in a production environment.

Assuming you have prior experience with queues, you’d know there’s quite a bit to configure. For a detailed guide, you can refer to the official Laravel documentation: https://laravel.com/docs/master/queues

We’ll tackle various challenges, including handling slow jobs, managing database connection exceptions, periodically restarting workers, and managing out of memory errors in jobs. Let’s jump right in!

Some of the parameters or middleware discussed below are defined within a parent job class.

Slow jobs

To prevent retries of slow jobs, we adjusted the $failOnTimeout property within the job class:

public bool $failOnTimeout = true;

This ensures that the job is marked as failed and should be fixed.

Managing Database Connection Exceptions

In instances where a database connection is unexpectedly lost, our aim is to automatically retry the jobs. However, for regular exceptions – we prefer to let a job fail after a single attempt. This is accomplished by setting specific properties on the job class and implementing a job middleware. It’s crucial to ensure that the values set in the worker are either equal to or higher than those in the job class.
The properties are as follows:

public int $timeout = 600; // 10 minutes
public int $tries = 24; // 4 hours, if database connection is lost.
public int $maxExceptions = 1; // fail immediately after the first exception.

We’ve also incorporated a middleware that releases the job back into the queue in the event of a lost connection. Any other exceptions will lead to a job failure, which aligns with our desired outcome.

namespace App\Jobs\Middleware;

use Exception;
use Illuminate\Database\DetectsLostConnections;
use Illuminate\Database\QueryException;

class ReleaseOnConnectionExceptions
{
    use DetectsLostConnections;

    public function handle($job, $next)
    {
        try {
            $next($job);
        } catch (QueryException $e) {
            // Release the job back onto the queue if a database connection exception occurs...
            if ($this->causedByLostConnection($e)) {
                $job->release(10 * 60); // 10 minutes
            } else {
                // Don't release the job
                throw $e;
            }
        } catch (Exception $e) {
            // Don't release the job
            throw $e;
        }
    }
}

Periodically Restart Workers

Our workers are tasked with processing multiple jobs. To ensure they continue to function smoothly, we’ve implemented a strategy to restart them after they’ve processed 1,000 jobs or have been running for an hour.
In Horizon, this can be configured by adding the following parameters to the worker configurations:

[
    'maxTime' => 3600, // 1 hour
    'maxJobs' => 1000,
]

Moreover, for jobs that consume a significant amount of memory, (one of our jobs, for example, increases the allowed memory using ‘ini_set’), we have a mechanism in place to force the worker to restart after it finishes processing the current job. This can be achieved by setting the shouldQuit property of the worker at the end of your handle method.

public function handle()
{
    $this->restartWorker();
}

protected function restartWorker()
{
    app('queue.worker')->shouldQuit = true;
}

If you opt to increase the memory limit for a specific job, it’s critical that this job implements the ShouldBeUnique interface. This helps to prevent multiple workers from simultaneously consuming precious memory.

Managing Out Of Memory Errors in Jobs

It is crucial to avoid memory issues in all scenarios. Horizon provides a memory limit parameter that can be adjusted in the worker configuration. However, this is considered a soft limit, meaning it will request the worker to cease operations smoothly and halt the job if this limit is exceeded. Despite this, workers can still exceed this limit. With a large number of workers, this could potentially slow down the system, making it unresponsive in the worst cases.

When an Out Of Memory Error occurs within a job, the job isn’t transferred to the failed jobs table, nor is the failed method invoked within the job to manage it gracefully. If such an event occurs, we believe the job should fail, so the issue can be resolved.
Laravel’s exception handler includes a $reservedMemory variable, which sets aside memory to manage fatal errors such as a memory exception if it arises. By default, this is set to 34KB. In our system, using the nunomaduro/collision package to render an out of memory error required at least 340KB of reserved memory. This can be adjusted in the register method of App\Exceptions\Handler:

// Increase the reserved memory for error handling, set to 400KB.
// This is needed to display errors properly on memory exhaustion.
HandleExceptions::$reservedMemory = str_repeat('x', 400000);

After this adjustment, we have enough memory to manage the exception and mark the job as failed.

We’ve developed a job middleware specifically designed to manage out of memory errors within jobs. This middleware “stores” the UUID of the job currently being executed in a static variable within this class. In the event of a FatalError, the App\Exceptions\Handler will invoke fatalErrorHandler with the exception to verify whether we need to mark the job as failed.

namespace App\Jobs\Middleware;

use App\Exceptions\Handler;
use App\Jobs\Job;
use App\Services\Horizon\Horizon;
use Exception;
use Symfony\Component\ErrorHandler\Error\FatalError;
use Throwable;

/**
 * This class is used to handle out of memory errors in jobs.
 * The job UUID is set before the job is executed.
 * fatalErrorHandler is called in the in @see Handler::register method, it marks the job as failed in Horizon.
 */
class HandleOutOfMemoryError
{
    private static ?string $jobUuid = null;

    /**
     * @param  Job  $job
     *
     * @throws Exception
     */
    public function handle($job, $next)
    {
        try {
            self::$jobUuid = $job->job->getJobId();
            $next($job);
            self::$jobUuid = null;
        } catch (Exception $e) {
            self::$jobUuid = null;
            throw $e;
        }
    }

    public static function fatalErrorHandler(): \Closure
    {
        return function (Throwable $exception) {
            if (self::$jobUuid && $exception instanceof FatalError) {
                app(Horizon::class)->markJobAsFailed(self::$jobUuid, $exception);
                self::$jobUuid = null;
            }
        };
    }
}

We registered fatalErrorHandler within the register method of App\Exceptions\Handler:

$this->reportable(
    HandleOutOfMemoryError::fatalErrorHandler()
);

Additionally, for completion this is the section used from the Horizon class to mark the job as failed. It calls the JobRepository from Horizon directly to handle it.

namespace App\Services\Horizon;

use Illuminate\Support\Collection;
use Laravel\Horizon\Contracts\JobRepository;
use Laravel\Horizon\JobPayload;
use Throwable;

class Horizon
{
    public function __construct(private JobRepository $jobs)
    {

    }

    public function getJob(string $id)
    {
        return $this->getJobs([$id])->first();
    }

    public function getJobs(array $ids): Collection
    {
        return $this->jobs->getJobs($ids);
    }

    public function markJobAsFailed(string $id, Throwable $exception)
    {
        $job = $this->getJob($id);

        if (! $job) {
            return;
        }

        $this->jobs->failed(
            $exception, $job->connection, $job->queue, new JobPayload($job->payload)
        );
    }
}

In conclusion, managing Laravel queues effectively can present certain challenges, but with the right strategies, these can be successfully overcome. At Pionect, we’ve developed solutions for handling slow jobs, managing database connection exceptions, periodically restarting workers, and managing out of memory errors. Understanding and configuring the Laravel queue system to fit your project’s needs is crucial. We hope our insights and solutions will assist you in your Laravel queue management. Keep learning and happy coding!

Daniel Ducro

CTO Pionect
19 Oktober, 2023