• While acknowledging the legal issues (and the law of unintended consequences politicians live by) I have to say that where ever possible the superior theoretical choice will always be developing against a full copy of the production data.
    Of course our company is somewhat unique in that we don't have data from individuals. We don't store customer credit cards, contact info, or whatever else might be considered PII.

    Second, I'm a lone wolf developer/DBA, there's just me to handle the entire development/production cycle, I'm both developer and DBA. Naturally, that's a huge advantage in some respects (with admitted downsides, naturally).

    A full blown production dataset has several advantages:

    1) Performance wise, what you see is what you'll get. Keeping data  synced between dev and prod with each release (new features/code goes up, data comes down) means I can safely reproduce any performance issues in development EXACTLY.

    2) Data wise you can pinpoint deficiencies in data import, data validation, expansion of data domains, etc. Not possible without production data.

    3) Related to point 1, it's much easier to tune performance with prod data, both in terms of data volume and those pesky (and ever changing) parameter sniffing and index usage issues.

    Now, if I were in healthcare.... 🙂