SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Best way to effienctly store hugh amounts of data


Best way to effienctly store hugh amounts of data

Author
Message
Maddave
Maddave
Right there with Babe
Right there with Babe (716 reputation)Right there with Babe (716 reputation)Right there with Babe (716 reputation)Right there with Babe (716 reputation)Right there with Babe (716 reputation)Right there with Babe (716 reputation)Right there with Babe (716 reputation)Right there with Babe (716 reputation)

Group: General Forum Members
Points: 716 Visits: 1466
Would be interested to hear on some of the best ways to approach this situation.
I've recently started at a company as a DBA that has lot's of developers, but has never had a DBA or database developer. They have recently started developing products which receive huge amounts of data, very quickly (300 transactions a second), which needs to be stored, and queried for 5 years. They had already created the database before I came along and are now looking to test it and see if improvements can be made.

One way they have implemented the insert process of the main bulk of the transactions is as follows. In an effort to keep the solution "generic", so that they can adapt it to fit multiple solutions, they have the following process flow.
A top level products table receives a product line information. An Id, product type etc.
Each product has attributes which are associated with it. A lookup is done on the product attributes table to get the attributes just for the product type.
Then a row is inserted into the productdetails table for each product attribute in a cursor loop process. For each attribute type, insert the corresponding value for that attribute type of the product into the table.

Summary:
Product header inserted.
select all attribute types on attributes table based product type.
For each attribute type, insert into productdetails table the products value of that attribute.

The idea is that if a new product comes along with a whole new set of unique attributes,, then they just add the new attribute types to the attributes table and the information can be stored.
To me this seems very inefficient as one product, with lots of details, is being split to on average 10 inserts into one table.

They predict there will be about 9 billion transactions a year, so on average that is going to be 90 billion rows in one table!! This has to be queried too.
I'm thinking it's gonna fall on it's face!

I realise this is a very open question with not much information, but any pointers on the best way to improve this would be great!
Sean Lange
Sean Lange
One Orange Chip
One Orange Chip (25K reputation)One Orange Chip (25K reputation)One Orange Chip (25K reputation)One Orange Chip (25K reputation)One Orange Chip (25K reputation)One Orange Chip (25K reputation)One Orange Chip (25K reputation)One Orange Chip (25K reputation)

Group: General Forum Members
Points: 25771 Visits: 17509
What you are describing is what is known as EAV (Entity Value Attribute). This is a horrible design in almost every sense of the word. There are a few cases where this type of pattern is ok but in general avoid it like the plague. You are absolutely right that it is going to fall on its face.

Check out this article explaining a generic EAV system, sounds like it is somewhat similar to what you have going on. http://www.simple-talk.com/opinion/opinion-pieces/bad-carma/

It doesn't take a rocket surgeon to figure out that once you have 20-30 attributes performance is a joke when you have to create queries with dynamic sql and 20-30 self joins. The best thing that can be done for performance is a complete rewrite. That isn't always possible of course but you are in for a long battle to get it straightened out.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
Mark Cowne
Mark Cowne
Hall of Fame
Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)

Group: General Forum Members
Points: 3081 Visits: 24042
Sean Lange (9/28/2012)
What you are describing is what is known as EAV (Entity Value Attribute). This is a horrible design in almost every sense of the word. There are a few cases where this type of pattern is ok but in general avoid it like the plague. You are absolutely right that it is going to fall on its face.

Check out this article explaining a generic EAV system, sounds like it is somewhat similar to what you have going on. http://www.simple-talk.com/opinion/opinion-pieces/bad-carma/

It doesn't take a rocket surgeon to figure out that once you have 20-30 attributes performance is a joke when you have to create queries with dynamic sql and 20-30 self joins. The best thing that can be done for performance is a complete rewrite. That isn't always possible of course but you are in for a long battle to get it straightened out.


+1

I had the misfortune to work on one of these EAV systems some years ago. Everything bad you read about these happened - poor performance, integrity issues, spiralling query complexity ... avoid if possible. Rewriting will be your best option.

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537




Maddave
Maddave
Right there with Babe
Right there with Babe (716 reputation)Right there with Babe (716 reputation)Right there with Babe (716 reputation)Right there with Babe (716 reputation)Right there with Babe (716 reputation)Right there with Babe (716 reputation)Right there with Babe (716 reputation)Right there with Babe (716 reputation)

Group: General Forum Members
Points: 716 Visits: 1466
Thanks guys. That article was scary reading. Very familiar and even some of the table names mentioned are the same as the ones the devs have deployed.

I really need to understand the application more before coming up with a solution, but I am glad others agree that the current deployment is completely not right!

Thanks.
patrickmcginnis59
patrickmcginnis59
SSChasing Mays
SSChasing Mays (645 reputation)SSChasing Mays (645 reputation)SSChasing Mays (645 reputation)SSChasing Mays (645 reputation)SSChasing Mays (645 reputation)SSChasing Mays (645 reputation)SSChasing Mays (645 reputation)SSChasing Mays (645 reputation)

Group: General Forum Members
Points: 645 Visits: 2333
This will be an interesting one!

I can understand why EAV is not popular, but what other setup will let folks enter new attributes without defining new table columns? I know one application I've encountered uses this setup so that new attributes and their values can be defined without a DBA having to update tables. What other method will work in this situation if EAVs are not used?
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (84K reputation)SSC Guru (84K reputation)SSC Guru (84K reputation)SSC Guru (84K reputation)SSC Guru (84K reputation)SSC Guru (84K reputation)SSC Guru (84K reputation)SSC Guru (84K reputation)

Group: General Forum Members
Points: 84369 Visits: 41061
Perhaps it's just the way I've been using them but I've had no such problems with a properly constructed EAV. Certainly, I've not had to do a 20 or 30 table self join.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Sean Lange
Sean Lange
One Orange Chip
One Orange Chip (25K reputation)One Orange Chip (25K reputation)One Orange Chip (25K reputation)One Orange Chip (25K reputation)One Orange Chip (25K reputation)One Orange Chip (25K reputation)One Orange Chip (25K reputation)One Orange Chip (25K reputation)

Group: General Forum Members
Points: 25771 Visits: 17509
Jeff Moden (10/1/2012)
Perhaps it's just the way I've been using them but I've had no such problems with a properly constructed EAV. Certainly, I've not had to do a 20 or 30 table self join.


Unfortunately the OP has already eluded to at least 10 self joins. One of the big challenges in this type of system is RI. Foreign keys and constraints can be a serious PITA to deal with. I have also seen them have real challenges with datatypes because the value columns will just be a varchar(max) and then you have to cast everything into the correct datatype. I have see other implementations where there was a datatype column to indicate what the datatype of the value column is.

I agree that properly constructed they can be useful and even preferred in some cases. The system described by the OP does not sound like that to me. It sounds like the beginning of a nightmare. That is just my $.02

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (84K reputation)SSC Guru (84K reputation)SSC Guru (84K reputation)SSC Guru (84K reputation)SSC Guru (84K reputation)SSC Guru (84K reputation)SSC Guru (84K reputation)SSC Guru (84K reputation)

Group: General Forum Members
Points: 84369 Visits: 41061
Sean Lange (10/1/2012)
Jeff Moden (10/1/2012)
Perhaps it's just the way I've been using them but I've had no such problems with a properly constructed EAV. Certainly, I've not had to do a 20 or 30 table self join.


Unfortunately the OP has already eluded to at least 10 self joins. One of the big challenges in this type of system is RI. Foreign keys and constraints can be a serious PITA to deal with. I have also seen them have real challenges with datatypes because the value columns will just be a varchar(max) and then you have to cast everything into the correct datatype. I have see other implementations where there was a datatype column to indicate what the datatype of the value column is.

I agree that properly constructed they can be useful and even preferred in some cases. The system described by the OP does not sound like that to me. It sounds like the beginning of a nightmare. That is just my $.02


I'll typically use SQL_Variant for such things. Although DRI is a bit of a pain (I agree that FK's really aren't practical here), the problem that properly written and used EAVs (and NVPs) solve usually makes it worth it if you do indeed allow such a thing. There are places, especially in reporting, where they pop up and most people dont even think of the tables as an EAV or NVP.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
TheSQLGuru
TheSQLGuru
SSChampion
SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)

Group: General Forum Members
Points: 12206 Visits: 8539
EAV "can" be very fast and efficient if you use it for single-table storage. But that then requires EXTRAORDINARY effort on the application side to handle virtually all of the data processing needs and even when you do that it is useful for a pretty narrow class of apps. There is at least one Web2.0 company doing this, and I have another client using EAV to very good effect as well in a system that uses EAV constructs for a few tables.

If you are going to do pure EAV storage and jump through the required hoops on the application side to manage the data, it is honestly a waste to spend the money for SQL Server. Use one of the many free storage systems available.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
Ben Teraberry
Ben Teraberry
UDP Broadcaster
UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)

Group: General Forum Members
Points: 1491 Visits: 1199
One minor point ... even if you decide to use what they've established, you can make the inserts quite a bit more efficient by ditching the cursor loop. The same result can be achieved in a set-based manner. Every cursor that is instantiated has a performance cost and that cost will add up if you're looking at 300 transactions a second.

└> bt


Forum Etiquette: How to post data/code on a forum to get the best help
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search