I love Laravel’s data seeding. I use it to create test and sometimes initial production data on every Laravel project that I do. But recently I was presented with a (very) large lump of pre-existing test data.

A bit of background … I was extracting microservices from a monolithic legacy system, and had decided that to minimise risk, I’d maintain data compatibility between old and new systems. With Laravel, this was really easy as Laravel models allow us to define table names and relationships even when they don’t conform to Laravel’s default expectations.

Having an existing, large scale, integration test database looked like a huge benefit, and it was. It would have taken ages to construct this much data from scratch, but thought of translating it to into seeding classes filled me with dread. Happily there’s a much better way of handling this.

So here’s what I did instead.

Firstly I dumped the test database. It was mysql so this worked for me:

mysqldump -u USERNAME -pPASSWORD DATABASENAME > testdata.sql

(note that there’s no space between the -p and your database password – that’s important)

If you’re not on the same server as the database you’ll need to include the database host details too:

mysqldump -h HOST -u USERNAME -pPASSWORD DATABASENAME > testdata.sql

Then I took the testdata.sql file and placed it in my projects database/seeds folder and replaced the default run() function in my project’s DatabaseSeeder class with this:


public function run()
{
    $script = getcwd().'/database/seeds/localhost.sql';

    $username = Config::get('database.connections.mysql.username');
    $password = Config::get('database.connections.mysql.password');
    $database = Config::get('database.connections.mysql.database');

    $command = "mysql -u $username -p$password $database < $script";

    exec($command);
}

Finally, assuming that you've already created and set Laravel's database config settings, a simple "php artisan db:seed" will populate it.

"OK", you may be saying at this point, "but why not just dump and restore directly? Why bothering involving Laravel?"

Well that takes us back to the core benefits of seeding, and there's a whole heap of them. I can reset the state of the database any time I need to, and I don't have worry about remembering where I stored my dump. Nor do I have to remember the restore syntax or the database credentials, they're all baked in. But more significantly, through the wonders of version control, it's a whole lot easier to share the test data with other members of the project. Those were the expected benefit, but it turned out that there were unexpected benefits too.

The data included users, and over time they changed. With this approach it was easy to remove just the user table population from the testdata.sql and replace it with a normal seed class. The run() function became:


public function run()
{
    $script = getcwd().'/database/seeds/localhost.sql';

    $username = Config::get('database.connections.mysql.username');
    $password = Config::get('database.connections.mysql.password');
    $database = Config::get('database.connections.mysql.database');

    $command = "mysql -u $username -p$password $database < $script";

    exec($command);

    Model::unguard();
    $this->call(UserTableSeeder::class);
    Model::reguard();
}

As this hybrid approach became better understood, other developers started using it too, adding data for new test conditions, sometimes replacing the table in the original testdata dump file, and other times just adding extra data via a seed class.

As the project developed, changing the table structures got easier too, we could add a migration and then choose whether to edit the evolving testdata.sql file or add a new seeder class, and any changes made were immediately available to whole team, who needed only pull from the project repository and re-seed.

In part II of this series, I'll look at a very different use case, the need to produce very large data files of realistic data to validate, and load test, uploading and processing at scale.