Peter Coles

Category: Big Data

Big Data Seeding Part I – Importing Large Datasets into Laravel

I love Laravel’s data seeding. I use it to create test and sometimes initial production data on every Laravel project that I do. But recently I was presented with a (very) large lump of pre-existing test data.

A bit of background … I was extracting microservices from a monolithic legacy system, and had decided that to minimise risk, I’d maintain data compatibility between old and new systems. With Laravel, this was really easy as Laravel models allow us to define table names and relationships even when they don’t conform to Laravel’s default expectations.

Having an existing, large scale, integration test database looked like a huge benefit, and it was. It would have taken ages to construct this much data from scratch, but thought of translating it to into seeding classes filled me with dread. Happily there’s a much better way of handling this.

Continue reading

Transferring large files

Sometimes I have to move large files between servers. Most often they’re database backups, and since we have single database tables that can extend to many hundreds of gigabytes, whole database dumps can easily generate files of one or more terrabytes.

Even more challenging was a recent need to re-construct historical data from backup drives on a development workstation and then upload the results.

But here’s the wrinkle. Once file sizes move beyond 3-4GB, transfers seem to become a smidgen unreliable, whatever protocol I use.

The answer … a very useful *nix utility called split.

Continue reading

Copyright © 2017 Peter Coles

Theme by Anders NorenUp ↑