• I used to favour production data for development and testing, but with the introduction of the Data Protection Act in the UK, doing so became something of a headache.

    It is still worth testing on a database of representative size*, but these days I would advocate using synthetic data. Production data can be of rather variable quality, and may not be available at all during the development phase. Synthetic data, if you know the algorithms used to generate it will have predictable characteristics, so you can run a query and whilst you may not know what you will get back, you should know exactly how big the result set should be.

    * So often, I have seen systems go live only to be beset by severe performance problems which needed to be addressed urgently. Using a full size database lets you find these problems before go-live.

    Many years ago, I worked on a CRM system. I was tasked with populating the database with enough test data to make the database size about the same as the live system would have. I wrote a program to generate it, populating all the tables. It took a couple of hours to run, but it gave data of predictable characteristics, and you could throw the database away and start again without too much pain.

    Usually, the data migration will happen close to go-live, so the testing on production data can happen at the User Acceptance testing (UAT) phase. Sure, there will be things in the production data that trip up the application. But letting the users test on the production data means they will take care of it the way they know how, and developers don't have to have the responsibility of it.