RE: Best way to effienctly store hugh amounts of data

SSChampion

Points: 11845

October 3, 2012 at 12:04 pm

Maddave (10/3/2012)
Thanks guys, all very interesting stuff. I'm getting more in to how this application is implemented and it's becoming scary. As Sean Lange mentioned, the developers have implemented the columns as varchar(max) and there is a separate table with each datatype listed for every specific column, so that it is known what data is being inserted.
I have also found that there is a parent/child relationship for some types of data so, for one "product", lets say, it can have multiple child products, which get read from a table and inserted into another table. When the product data is inserted into the databases, the parent is inserted first, then all child records for that item are inserted into the same table, but with a parent id column set to the parent.
The next thing that happens is, each specific attribute for each child, is inserted into another table. So for example, the values, type = "addrLine1" and value = "123 5th avenue" are inserted, then, type = "addrLine2" and value = "New York" is inserted into another row of the same table, with the product id of the child. The "type" coluum is a foreign key of the only table the developers want to maintain, by adding new attributes when a different product comes along.
All this does, is lead to the one table having about 10 or more rows, (therefore inserts too) for each attribute of just one item! This is bad enough as it is, but the child records could be in their millions for just one product! So what could potentially be a million single inserts, actually gets multiplied by the number of specific inserts for each child attributes!
For example, 17000 products where meant to be added to the databases yersterday, each had 1000's and some had millions of child items. This 17,000 transactions took 4 hours to complete!!
Once I get the code out, I'll post it and see what you think! It's quite shocking.

The big question I have: Are you doing anything at all with the attribute data other than storing it and returning it back to the application? Are you using those attributes to filter queries, compile statistics, etc? If not, and if all you are doing is passing all the attributes back to the application, I would look at storing that data in an XML or JSON format in a single column in the parent table. (I prefer JSON since it is more efficient.) We have an application that works with Twitter data and we keep all the data that we want to present but that we never filter by in a JSON string because the attributes included with the tweet record are highly variable based on a number of factors. Queries are run based on attributes like username, date, etc, that are broken out into columns in the parent table and the rest of the data is conveyed via JSON to the application.

If the attribute values are relevant to querying and whatnot, something I would look at is the use of sparse columns. Microsoft implemented that strategy in the database architecture that supports SharePoint. They put all their "things" into a single table and have a boatload of all possible attributes. Any individual item will only use a tiny fraction of the attributes and so the majority of the data in those columns is null. Without knowing the specifics of what your system is doing, it sounds as though your needs may be similar.

Good luck.

└> bt

Forum Etiquette: How to post data/code on a forum to get the best help[/url]