Laravel chunk vs lazy vs cursor: Large Dataset Commands

Last year I inherited a codebase with an artisan command that was supposed to send monthly reports to all users. It looked something like User::all()->each(...). The app had grown to 1.2 million users. The command was eating 3 GB of memory, crashing after 40 minutes, and nobody had noticed because it ran on a cron job at 3 AM and the error notifications were going to a Slack channel that someone had muted.

Here's what was wrong and how to fix it.

The Naive Way (Don't Do This)

Here's what the command looked like, simplified:

public function handle(): void
{
    $users = User::all();

    foreach ($users as $user) {
        $this->generateReport($user);
        $this->info("Processed user {$user->id}");
    }

    $this->comment("Done. Processed {$users->count()} users.");
}

User::all() loads every single row into memory as an Eloquent model. With a million users, that's a million model objects sitting in RAM simultaneously. Each model consumes a few KB of memory (more if it has casts, accessors, or loaded relationships). You'll run out of memory before you process your first batch. Even if your server has enough RAM, it's a waste.

chunk()

chunk() is the first tool most people reach for, and for good reason. It loads records in batches and releases them from memory between batches:

public function handle(): void
{
    $processed = 0;

    User::query()
        ->where('is_active', true)
        ->chunk(1000, function (Collection $users) use (&$processed) {
            foreach ($users as $user) {
                $this->info("Processing user {$user->id}...");
                $this->generateReport($user);
                $processed++;
            }
        });

    $this->comment("Done. Processed {$processed} users.");
}

This runs a query like SELECT * FROM users WHERE is_active = 1 LIMIT 1000 OFFSET 0, processes those 1000 records, frees the memory, then grabs the next 1000. Your memory usage stays roughly constant instead of growing linearly with your dataset.

The batch size matters. Too small (like 10) and you're making way too many database queries. Too large (like 100,000) and you're back to the memory problem. I usually start at 1000 and adjust from there. For heavy models with lots of relationships, I might drop to 500. For skinny models, 2000 works fine.

One gotcha with chunk(): don't update the column you're sorting by (the primary key by default) inside the chunk callback. If you change IDs or delete records while chunking, you'll skip rows. Use chunkById() if you need to modify records:

User::query()
    ->where('last_login_at', '<', now()->subYear())
    ->chunkById(1000, function (Collection $users) {
        foreach ($users as $user) {
            $user->update(['is_active' => false]);
        }
    });

chunkById uses WHERE id > ? instead of OFFSET, so it's immune to the shifting-rows problem and is generally faster too (offsets get slower as they grow).

lazy()

Laravel's lazy() method gives you a LazyCollection that loads records in chunks behind the scenes but presents them as a flat iterable. It's chunk with ergonomics:

public function handle(): void
{
    $count = User::query()
        ->where('is_active', true)
        ->lazy(1000)
        ->each(function (User $user) {
            $this->info("Processing user {$user->id}...");
            $this->generateReport($user);
        })
        ->count();

    $this->comment("Processed {$count} users.");
}

Internally, lazy() uses chunkById and PHP generators. The difference is you don't have to deal with the nested closure. Your code reads like a normal collection pipeline. I've switched most of my commands from chunk() to lazy() because the code is just easier to read.

cursor()

cursor() uses a PHP generator to yield one model at a time from a single database query. It only keeps one model in memory at any time:

foreach (User::cursor() as $user) {
    $this->generateReport($user);
}

Sounds perfect, right? Here's the catch: it holds a database connection open for the entire iteration. For a million records, that could be minutes. If your database connection times out, you're in trouble. It's also incompatible with MySQL's default PDO::ATTR_EMULATE_PREPARES setting. If that's off, you can't run other queries while a cursor is open.

I use cursor() for read-only operations on moderately sized datasets (under 100K rows) where I want minimal memory usage and the processing is fast. For anything larger or longer-running, lazy() is safer.

Progress Bars: Show What's Happening

Long-running commands should always have progress bars. The difference between "is this thing stuck?" and "okay, 45% done, another 20 minutes" is huge.

public function handle(): void
{
    $total = User::where('is_active', true)->count();
    $bar = $this->output->createProgressBar($total);
    $bar->start();

    User::query()
        ->where('is_active', true)
        ->lazy(1000)
        ->each(function (User $user) use ($bar) {
            $this->generateReport($user);
            $bar->advance();
        });

    $bar->finish();
    $this->newLine();
    $this->comment("Processed {$total} users.");
}

That initial count query is worth it. Without a known total, you can only show a spinner, not a progress bar. The count query itself is fast, it's the processing that takes time.

Preventing Memory Leaks

Even with chunking, long-running commands can leak memory. Here are the things that have burned me:

Event listeners accumulate. If your model fires events on save and those listeners hold references, you'll leak. Disable events if you don't need them:

User::withoutEvents(function () {
    // Process users without firing model events
});

Query logging eats memory. Laravel logs every query by default in non-production. If you're running a million queries, that's a lot of logged strings:

DB::disableQueryLog();

Eloquent's identity map. This one's subtle. If you're loading relationships inside your loop, Eloquent can hold onto related models even after the parent is garbage collected. Calling $user->unsetRelations() after processing can help, but honestly, if you're seeing memory creep, the best debugging tool is memory_get_usage(true) logged every 1000 iterations.

Quick Reference

chunk() / chunkById(): The safe default. Works everywhere, handles mutations well. Use chunkById when you're updating records.
lazy() / lazyById(): Same behavior as chunk, but with a nicer API via lazy collections. My go-to for new code.
cursor(): Lowest memory usage, but holds a database connection open. Best for read-only, fast processing of moderate datasets.
User::all(): Only if you have fewer than a few thousand records. Seriously.

The pattern I've settled on for most production commands: lazyById(1000) with a progress bar, query logging disabled, and a summary line at the end. It handles millions of records with flat memory usage, and when something goes wrong at 3 AM, the output actually tells you what happened.

Artisan Commands That Handle Millions of Records

/ Content