Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 123»»»

Best way to effienctly store hugh amounts of data Expand / Collapse
Author
Message
Posted Friday, September 28, 2012 2:46 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, July 18, 2014 11:33 PM
Points: 166, Visits: 484
Would be interested to hear on some of the best ways to approach this situation.
I've recently started at a company as a DBA that has lot's of developers, but has never had a DBA or database developer. They have recently started developing products which receive huge amounts of data, very quickly (300 transactions a second), which needs to be stored, and queried for 5 years. They had already created the database before I came along and are now looking to test it and see if improvements can be made.

One way they have implemented the insert process of the main bulk of the transactions is as follows. In an effort to keep the solution "generic", so that they can adapt it to fit multiple solutions, they have the following process flow.
A top level products table receives a product line information. An Id, product type etc.
Each product has attributes which are associated with it. A lookup is done on the product attributes table to get the attributes just for the product type.
Then a row is inserted into the productdetails table for each product attribute in a cursor loop process. For each attribute type, insert the corresponding value for that attribute type of the product into the table.

Summary:
Product header inserted.
select all attribute types on attributes table based product type.
For each attribute type, insert into productdetails table the products value of that attribute.

The idea is that if a new product comes along with a whole new set of unique attributes,, then they just add the new attribute types to the attributes table and the information can be stored.
To me this seems very inefficient as one product, with lots of details, is being split to on average 10 inserts into one table.

They predict there will be about 9 billion transactions a year, so on average that is going to be 90 billion rows in one table!! This has to be queried too.
I'm thinking it's gonna fall on it's face!

I realise this is a very open question with not much information, but any pointers on the best way to improve this would be great!
Post #1365688
Posted Friday, September 28, 2012 7:36 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 12:52 PM
Points: 13,069, Visits: 11,907
What you are describing is what is known as EAV (Entity Value Attribute). This is a horrible design in almost every sense of the word. There are a few cases where this type of pattern is ok but in general avoid it like the plague. You are absolutely right that it is going to fall on its face.

Check out this article explaining a generic EAV system, sounds like it is somewhat similar to what you have going on. http://www.simple-talk.com/opinion/opinion-pieces/bad-carma/

It doesn't take a rocket surgeon to figure out that once you have 20-30 attributes performance is a joke when you have to create queries with dynamic sql and 20-30 self joins. The best thing that can be done for performance is a complete rewrite. That isn't always possible of course but you are in for a long battle to get it straightened out.


_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
Post #1365867
Posted Friday, September 28, 2012 8:19 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Tuesday, April 8, 2014 6:13 AM
Points: 1,694, Visits: 19,550
Sean Lange (9/28/2012)
What you are describing is what is known as EAV (Entity Value Attribute). This is a horrible design in almost every sense of the word. There are a few cases where this type of pattern is ok but in general avoid it like the plague. You are absolutely right that it is going to fall on its face.

Check out this article explaining a generic EAV system, sounds like it is somewhat similar to what you have going on. http://www.simple-talk.com/opinion/opinion-pieces/bad-carma/

It doesn't take a rocket surgeon to figure out that once you have 20-30 attributes performance is a joke when you have to create queries with dynamic sql and 20-30 self joins. The best thing that can be done for performance is a complete rewrite. That isn't always possible of course but you are in for a long battle to get it straightened out.


+1

I had the misfortune to work on one of these EAV systems some years ago. Everything bad you read about these happened - poor performance, integrity issues, spiralling query complexity ... avoid if possible. Rewriting will be your best option.


____________________________________________________

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

Never approach a goat from the front, a horse from the rear, or a fool from any direction.
Post #1365894
Posted Friday, September 28, 2012 8:58 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, July 18, 2014 11:33 PM
Points: 166, Visits: 484
Thanks guys. That article was scary reading. Very familiar and even some of the table names mentioned are the same as the ones the devs have deployed.

I really need to understand the application more before coming up with a solution, but I am glad others agree that the current deployment is completely not right!

Thanks.
Post #1365929
Posted Monday, October 1, 2012 11:10 AM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Wednesday, January 16, 2013 4:23 PM
Points: 415, Visits: 2,333
This will be an interesting one!

I can understand why EAV is not popular, but what other setup will let folks enter new attributes without defining new table columns? I know one application I've encountered uses this setup so that new attributes and their values can be defined without a DBA having to update tables. What other method will work in this situation if EAVs are not used?

Post #1366670
Posted Monday, October 1, 2012 11:45 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 11:31 AM
Points: 36,728, Visits: 31,180
Perhaps it's just the way I've been using them but I've had no such problems with a properly constructed EAV. Certainly, I've not had to do a 20 or 30 table self join.

--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1366684
Posted Monday, October 1, 2012 11:55 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 12:52 PM
Points: 13,069, Visits: 11,907
Jeff Moden (10/1/2012)
Perhaps it's just the way I've been using them but I've had no such problems with a properly constructed EAV. Certainly, I've not had to do a 20 or 30 table self join.


Unfortunately the OP has already eluded to at least 10 self joins. One of the big challenges in this type of system is RI. Foreign keys and constraints can be a serious PITA to deal with. I have also seen them have real challenges with datatypes because the value columns will just be a varchar(max) and then you have to cast everything into the correct datatype. I have see other implementations where there was a datatype column to indicate what the datatype of the value column is.

I agree that properly constructed they can be useful and even preferred in some cases. The system described by the OP does not sound like that to me. It sounds like the beginning of a nightmare. That is just my $.02


_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
Post #1366691
Posted Monday, October 1, 2012 3:44 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 11:31 AM
Points: 36,728, Visits: 31,180
Sean Lange (10/1/2012)
Jeff Moden (10/1/2012)
Perhaps it's just the way I've been using them but I've had no such problems with a properly constructed EAV. Certainly, I've not had to do a 20 or 30 table self join.


Unfortunately the OP has already eluded to at least 10 self joins. One of the big challenges in this type of system is RI. Foreign keys and constraints can be a serious PITA to deal with. I have also seen them have real challenges with datatypes because the value columns will just be a varchar(max) and then you have to cast everything into the correct datatype. I have see other implementations where there was a datatype column to indicate what the datatype of the value column is.

I agree that properly constructed they can be useful and even preferred in some cases. The system described by the OP does not sound like that to me. It sounds like the beginning of a nightmare. That is just my $.02


I'll typically use SQL_Variant for such things. Although DRI is a bit of a pain (I agree that FK's really aren't practical here), the problem that properly written and used EAVs (and NVPs) solve usually makes it worth it if you do indeed allow such a thing. There are places, especially in reporting, where they pop up and most people dont even think of the tables as an EAV or NVP.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1366803
Posted Tuesday, October 2, 2012 11:31 AM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: 2 days ago @ 8:36 AM
Points: 4,319, Visits: 6,112
EAV "can" be very fast and efficient if you use it for single-table storage. But that then requires EXTRAORDINARY effort on the application side to handle virtually all of the data processing needs and even when you do that it is useful for a pretty narrow class of apps. There is at least one Web2.0 company doing this, and I have another client using EAV to very good effect as well in a system that uses EAV constructs for a few tables.

If you are going to do pure EAV storage and jump through the required hoops on the application side to manage the data, it is honestly a waste to spend the money for SQL Server. Use one of the many free storage systems available.


Best,

Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru at GMail
Post #1367198
Posted Tuesday, October 2, 2012 12:19 PM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Wednesday, December 4, 2013 1:49 PM
Points: 1,104, Visits: 1,174
One minor point ... even if you decide to use what they've established, you can make the inserts quite a bit more efficient by ditching the cursor loop. The same result can be achieved in a set-based manner. Every cursor that is instantiated has a performance cost and that cost will add up if you're looking at 300 transactions a second.

└> bt


Forum Etiquette: How to post data/code on a forum to get the best help
Post #1367220
« Prev Topic | Next Topic »

Add to briefcase 123»»»

Permissions Expand / Collapse