Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««123»»

Best way to effienctly store hugh amounts of data Expand / Collapse
Author
Message
Posted Tuesday, October 2, 2012 1:00 PM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Today @ 7:10 AM
Points: 4,436, Visits: 6,338
bteraberry (10/2/2012)
One minor point ... even if you decide to use what they've established, you can make the inserts quite a bit more efficient by ditching the cursor loop. The same result can be achieved in a set-based manner. Every cursor that is instantiated has a performance cost and that cost will add up if you're looking at 300 transactions a second.


Agreed. I would imagine that a table valued parameter would be the simple solution here for efficient batched, single-hit insert.


Best,

Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru at GMail
Post #1367244
Posted Wednesday, October 3, 2012 2:10 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Today @ 7:10 AM
Points: 215, Visits: 633
Thanks guys, all very interesting stuff. I'm getting more in to how this application is implemented and it's becoming scary. As Sean Lange mentioned, the developers have implemented the columns as varchar(max) and there is a separate table with each datatype listed for every specific column, so that it is known what data is being inserted.

I have also found that there is a parent/child relationship for some types of data so, for one "product", lets say, it can have multiple child products, which get read from a table and inserted into another table. When the product data is inserted into the databases, the parent is inserted first, then all child records for that item are inserted into the same table, but with a parent id column set to the parent.
The next thing that happens is, each specific attribute for each child, is inserted into another table. So for example, the values, type = "addrLine1" and value = "123 5th avenue" are inserted, then, type = "addrLine2" and value = "New York" is inserted into another row of the same table, with the product id of the child. The "type" coluum is a foreign key of the only table the developers want to maintain, by adding new attributes when a different product comes along.

All this does, is lead to the one table having about 10 or more rows, (therefore inserts too) for each attribute of just one item! This is bad enough as it is, but the child records could be in their millions for just one product! So what could potentially be a million single inserts, actually gets multiplied by the number of specific inserts for each child attributes!

For example, 17000 products where meant to be added to the databases yersterday, each had 1000's and some had millions of child items. This 17,000 transactions took 4 hours to complete!!


Once I get the code out, I'll post it and see what you think! It's quite shocking.
Post #1367454
Posted Wednesday, October 3, 2012 8:10 AM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Today @ 7:10 AM
Points: 4,436, Visits: 6,338
Maddave (10/3/2012)
Thanks guys, all very interesting stuff. I'm getting more in to how this application is implemented and it's becoming scary. As Sean Lange mentioned, the developers have implemented the columns as varchar(max) and there is a separate table with each datatype listed for every specific column, so that it is known what data is being inserted.

I have also found that there is a parent/child relationship for some types of data so, for one "product", lets say, it can have multiple child products, which get read from a table and inserted into another table. When the product data is inserted into the databases, the parent is inserted first, then all child records for that item are inserted into the same table, but with a parent id column set to the parent.
The next thing that happens is, each specific attribute for each child, is inserted into another table. So for example, the values, type = "addrLine1" and value = "123 5th avenue" are inserted, then, type = "addrLine2" and value = "New York" is inserted into another row of the same table, with the product id of the child. The "type" coluum is a foreign key of the only table the developers want to maintain, by adding new attributes when a different product comes along.

All this does, is lead to the one table having about 10 or more rows, (therefore inserts too) for each attribute of just one item! This is bad enough as it is, but the child records could be in their millions for just one product! So what could potentially be a million single inserts, actually gets multiplied by the number of specific inserts for each child attributes!

For example, 17000 products where meant to be added to the databases yersterday, each had 1000's and some had millions of child items. This 17,000 transactions took 4 hours to complete!!


Once I get the code out, I'll post it and see what you think! It's quite shocking.


Quite a bit of unfortunate stuff going on here. Good luck with a) getting it to perform/scale and b) getting them to do a major refactor to allow a) to actually happen. Oh, and drop me a line if you need help with either of those!!


Best,

Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru at GMail
Post #1367702
Posted Wednesday, October 3, 2012 8:21 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 3:33 PM
Points: 13,220, Visits: 12,698
Maddave (10/3/2012)
Thanks guys, all very interesting stuff. I'm getting more in to how this application is implemented and it's becoming scary. As Sean Lange mentioned, the developers have implemented the columns as varchar(max) and there is a separate table with each datatype listed for every specific column, so that it is known what data is being inserted.

I have also found that there is a parent/child relationship for some types of data so, for one "product", lets say, it can have multiple child products, which get read from a table and inserted into another table. When the product data is inserted into the databases, the parent is inserted first, then all child records for that item are inserted into the same table, but with a parent id column set to the parent.
The next thing that happens is, each specific attribute for each child, is inserted into another table. So for example, the values, type = "addrLine1" and value = "123 5th avenue" are inserted, then, type = "addrLine2" and value = "New York" is inserted into another row of the same table, with the product id of the child. The "type" coluum is a foreign key of the only table the developers want to maintain, by adding new attributes when a different product comes along.

All this does, is lead to the one table having about 10 or more rows, (therefore inserts too) for each attribute of just one item! This is bad enough as it is, but the child records could be in their millions for just one product! So what could potentially be a million single inserts, actually gets multiplied by the number of specific inserts for each child attributes!

For example, 17000 products where meant to be added to the databases yersterday, each had 1000's and some had millions of child items. This 17,000 transactions took 4 hours to complete!!


Once I get the code out, I'll post it and see what you think! It's quite shocking.


YIKES!!!!!

It sounds like you may be able to write the next chapter in the article I posted. As a few others have said in this thread, EAV is useful for extremely specific situations. It is by no means a replacement for relational data in an OLTP system. I hope you are able to either talk some sense into these people. Your 17,000 product insert should take a few seconds...4 hours if unbelievable!!!


_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
Post #1367716
Posted Wednesday, October 3, 2012 9:53 AM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Monday, November 17, 2014 2:27 AM
Points: 418, Visits: 652
out of interest then; if I was selling books (ISBN, Author, Pages), wine (bottle size, ABV,Vinyard,grape,acidity) and shirts (material,collar type, sleeve length), how would you recommend storing the attributes for those products so that they were both easy to insert and easy to extract.

Post #1367818
Posted Wednesday, October 3, 2012 11:45 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Friday, November 14, 2014 4:49 PM
Points: 1,093, Visits: 1,175
aaron.reese (10/3/2012)
out of interest then; if I was selling books (ISBN, Author, Pages), wine (bottle size, ABV,Vinyard,grape,acidity) and shirts (material,collar type, sleeve length), how would you recommend storing the attributes for those products so that they were both easy to insert and easy to extract.



If you're going to try to cram such disparate entities into a single table you need to have a very very good reason to do so. In your scenario I don't see any reason why you wouldn't have separate tables for the completely different items you're selling. You can still query in such a way as to combine different items from the different tables without sacrificing sensible design.


└> bt


Forum Etiquette: How to post data/code on a forum to get the best help
Post #1367899
Posted Wednesday, October 3, 2012 12:04 PM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Friday, November 14, 2014 4:49 PM
Points: 1,093, Visits: 1,175
Maddave (10/3/2012)
Thanks guys, all very interesting stuff. I'm getting more in to how this application is implemented and it's becoming scary. As Sean Lange mentioned, the developers have implemented the columns as varchar(max) and there is a separate table with each datatype listed for every specific column, so that it is known what data is being inserted.

I have also found that there is a parent/child relationship for some types of data so, for one "product", lets say, it can have multiple child products, which get read from a table and inserted into another table. When the product data is inserted into the databases, the parent is inserted first, then all child records for that item are inserted into the same table, but with a parent id column set to the parent.
The next thing that happens is, each specific attribute for each child, is inserted into another table. So for example, the values, type = "addrLine1" and value = "123 5th avenue" are inserted, then, type = "addrLine2" and value = "New York" is inserted into another row of the same table, with the product id of the child. The "type" coluum is a foreign key of the only table the developers want to maintain, by adding new attributes when a different product comes along.

All this does, is lead to the one table having about 10 or more rows, (therefore inserts too) for each attribute of just one item! This is bad enough as it is, but the child records could be in their millions for just one product! So what could potentially be a million single inserts, actually gets multiplied by the number of specific inserts for each child attributes!

For example, 17000 products where meant to be added to the databases yersterday, each had 1000's and some had millions of child items. This 17,000 transactions took 4 hours to complete!!


Once I get the code out, I'll post it and see what you think! It's quite shocking.


The big question I have: Are you doing anything at all with the attribute data other than storing it and returning it back to the application? Are you using those attributes to filter queries, compile statistics, etc? If not, and if all you are doing is passing all the attributes back to the application, I would look at storing that data in an XML or JSON format in a single column in the parent table. (I prefer JSON since it is more efficient.) We have an application that works with Twitter data and we keep all the data that we want to present but that we never filter by in a JSON string because the attributes included with the tweet record are highly variable based on a number of factors. Queries are run based on attributes like username, date, etc, that are broken out into columns in the parent table and the rest of the data is conveyed via JSON to the application.

If the attribute values are relevant to querying and whatnot, something I would look at is the use of sparse columns. Microsoft implemented that strategy in the database architecture that supports SharePoint. They put all their "things" into a single table and have a boatload of all possible attributes. Any individual item will only use a tiny fraction of the attributes and so the majority of the data in those columns is null. Without knowing the specifics of what your system is doing, it sounds as though your needs may be similar.

Good luck.


└> bt


Forum Etiquette: How to post data/code on a forum to get the best help
Post #1367920
Posted Thursday, October 4, 2012 8:03 AM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Today @ 7:10 AM
Points: 4,436, Visits: 6,338
aaron.reese (10/3/2012)
out of interest then; if I was selling books (ISBN, Author, Pages), wine (bottle size, ABV,Vinyard,grape,acidity) and shirts (material,collar type, sleeve length), how would you recommend storing the attributes for those products so that they were both easy to insert and easy to extract.



1) As I have said before, if you do EAV "properly" (which is VERY difficult/tedious), it really can be an efficient and "proper" design for this type of varied data collections. What you currently have is neither efficient nor proper on a variety of fronts.

2) Having a table per object type sold is unworkable - how many different types of objects does Amazon sell??

3) Sparse Columns come to mind as a potential solution for what you are trying to do, especially if you "genericize" them and let the application logic know what column is what attribute for each type of object. Again, that gets "difficult" for the app, but will SIGNIFICANTLY reduce the total columns you would need and also reduce the number of times over the life of the app that you will need to add columns to the table (unless you just start out with some very high number such as 1000).

For example, have 5 Attribute columns: Attribute1, Attribute2, Attribute3, Attribute4, Attribute5. For Books, Attribute1 would be ISBN, Attribute2 would be Author, etc. But for Wine Attribute1 would be bottle size and Attribute2 would be ABV, etc.

I note that Sparse columns do carry some downsides of primarily being xml-ish in nature and have quite a few caveats, limitations and provisos. Also, if you did go with a generic set you might have few-enough of them to not really benefit from their overhead.





Best,

Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru at GMail
Post #1368428
Posted Thursday, October 4, 2012 10:57 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Friday, November 14, 2014 4:49 PM
Points: 1,093, Visits: 1,175
TheSQLGuru (10/4/2012)
2) Having a table per object type sold is unworkable - how many different types of objects does Amazon sell??


So if you're selling products you have to design a schema that would work for Amazon.com? What if you're only selling three different types of items?

I disagree with your dogmatism. The correct answer is that "it depends". I've seen a lot of developers come up with very complex solutions to support a flexibility they don't really need and they end up wasting a lot of time in the process.


└> bt


Forum Etiquette: How to post data/code on a forum to get the best help
Post #1368584
Posted Thursday, October 4, 2012 2:18 PM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Today @ 7:10 AM
Points: 4,436, Visits: 6,338
bteraberry (10/4/2012)
TheSQLGuru (10/4/2012)
2) Having a table per object type sold is unworkable - how many different types of objects does Amazon sell??


So if you're selling products you have to design a schema that would work for Amazon.com? What if you're only selling three different types of items?

I disagree with your dogmatism. The correct answer is that "it depends". I've seen a lot of developers come up with very complex solutions to support a flexibility they don't really need and they end up wasting a lot of time in the process.


It is easy to show points by taking things to the extreme. But in this case say you only sell 3 types of things and you code that. Now you sell a 4th thing. You will likely have to modify the database schema, load processes, application code to know about that additional product table. Then a 5th, etc. I imagine there are few entities of any size that sell things that don't add new products/product types (or at least want to)!

And I thought my dogmatism was actually anti-dogmatism, with several proposed solutions and guidances about potential pitfalls. And I ABSOLUTELY agree with your statement that developers sometimes (often?) build solutions that are overly/unnecessarily complex (or use new/shiney "crap" to do so) and wind up with pain because of those decisions. Hell - I make quite a bit of money cleaning up those messes!!




Best,

Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru at GMail
Post #1368704
« Prev Topic | Next Topic »

Add to briefcase ««123»»

Permissions Expand / Collapse