RE: Production Subsets

David Rueter

SSCrazy

Points: 2650

August 21, 2014 at 11:51 pm

#1740081

Unless there is a compelling reason not to, developing on full production data is best.

Consider:

Comprehensive set of values are needed to test logic
[p]Simple things like a NULL where you didn't expect one, or a non-numeric, or a duplicate value can cause unexpected problems when you move to production unless you developed and tested against full production data[/p]
Performance can be vastly different depending on the volume of data involved
[p]That CTE or Correlated Subquery might work fine with modest numbers of rows...but may need to be refactored to hold up under the demands of full production data. It is better to test and tune as you go.
[/p]
Helps explain and define requirements
[p]Sure, in a perfect world there is documentation about the data structures, and there is good referential integrity. But in the real world in which I live, sometimes there is neither, and reverse-engineering and research is required. I have often come across situations where there may be thousands of rows in a table with nulls in a column...and then rows that have a value. If I had only a subset of production data, I could likely interpret that a column was not being used, when in fact it actually it was.[/p]
Realistic demos and tests
[p]Imaginary data and simple "Test 1" data introduce an additional layer of abstraction for both developers and stakeholders. "Imagine that is a real order detail record.", vs. looking at an actual order detail. Besides the unexpected data that may affect behavior, looking at dummy data tends to cause us to gloss over problems because we aren't really interpreting what we are seeing in real world terms
[/p]
Continuous Data Improvement
[p]Some of development is about improving data and the way it is organized. For example, you might notice that you have long address values that are being truncated, and an address2 column is needed. Or that you are in danger of overflowing an int identity column. Or need for data cleansing for rows imported from another source. Or any number of opportunities for improvement. These things might not be on a requirements document or even on an agile story, but they are things that should be corrected. There is a better chance of finding opportunities for improvement when you are seeing more data.
[/p]
Simplicity
[p]Just today I was developing a complex stored procedure. I was testing against full production data, and it worked fine. I handed the code off to another developer who was testing against a dev data...and my code did not return the expected results. It turned out that the problem was with the dev data. No big deal, but there was confusion and some wasted time that would have been avoided if we both were using full production data.
[/p]