I've always defined "big data" as having two important characteristics. Volume and velocity. Volume is sort of obvious; the sheer volume of data is more than we've had in the past. This is part partly the nature of the problems we do today but a lot of it is a fact that since it's cheap to store it, we store it!
Velocity is a measure of how rapidly the data changes, once it is stored. If you have ever done direct mail campaigns in the old days, you would have gotten a mailing list from one of several suppliers. Depending on what the mailing list supplier uses a classification system, you could order by ZIP Codes and geography, and some subgroups. So you could do things like "doctors in Western Pennsylvania who specialize in heart surgery." to target your audience. The contract always included an agreement to buy back failed mailings, up to a particular limit. You had agreed to set to accept X% undeliverable mail because the supplier knew that he could never be up to date all the time. This principle of an expected error rate still holds today. Even if you have scrubbed your data, while you're waiting to get your mailing out some of your customers are going to move, change their names, or die.
I like to think of this is another version of the Heisenberg uncertainty principle. If I can get an exact summary of my data model, it has to be static. If my data is actually changing, the best I can do is try to get a measurement of the amount of error in my information.
Please post DDL and follow ANSI/ISO standards when asking for help.