I think a key component in making this feasible is to stop designing systems to achieve a result and instead have them model a process accurately. It would mean we'd have specs for which data can exist and when/where it exists. Generating test data from these specs would also be relatively easy, since you're not trying to guess what the representative set in production is and instead have a recipe for creating one. Detailed data contracts for external communication would be a prerequisite for this though.