I'm trying to figure out if there's a better database design here, something other than EAV or 3000 columns in a table. On order to do that, I need to have at least an idea where the data comes from, what this actually is and how it'll be used.
Will the data be stored for long periods after it's entered or is it enter, print report, delete?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Why are you so intent on storing all the data for all the reports in a single table ?
Why not a table per report ?
GilaMonster (12/19/2009)
I'm trying to figure out if there's a better database design here, something other than EAV or 3000 columns in a table. On order to do that, I need to have at least an idea where the data comes from, what this actually is and how it'll be used.
Ok, here's the full scoop then.
ProductA comes in containers.
Each container has 20 pallets.
Each pallet has 75 boxes.
Each box (depending on the fruit size) has 4 to 12 fruits.
All boxes can be the exact same fruit size, or it can have multiple sizes, but each box always has the same fruit size.
Out of those (20 pallets * 75 boxes ea.) = 1500 boxes, 10 boxes are randomly picked out for quality inspection.
Different inspection parameters and data is taken from each from in those 10 boxes. These parameters and data is what makes up the "Fields" for the report.
All the data for each box will be stored ONE record with the 3000 fields design, or 100 records with the EAV design.
The data for each box consists of the total number of fruit that matched the evaluated parameter. For example, one of the parameters is mold. Out of a box of 5 fruits, 1 had mold, so "1" is typed in for that field.
Once the inspection report is filled out with however many boxes were inspected (the total number of boxes varies per product), it is either saved locally and then at a later time transferred into the SQL table.
Once the data is in SQL, a web page will pull that data and show it as a "report." And this is where the data ends. Absolutely nothing else is done with the data.
GilaMonster (12/19/2009)
Will the data be stored for long periods after it's entered or is it enter, print report, delete?
Looking at it from the data perspective,
1. It gets saved into a temporary storage.
2. The data is transferred into a SQL table.
3. A report of the data can be viewed/printed
4. Never, ever deleted. Stored in this table regularly for about a month after it enters SQL, however, in some cases (claims, returns, complains, etc.) the data will remain in SQL for many months, even years. Once this regular period passes, the data will be moved to an "archive" table, which will be basically storage but not frequently accessed.
Dave Ballantyne (12/19/2009)
Why are you so intent on storing all the data for all the reports in a single table ?Why not a table per report ?
I didn't think of a separate table per report because I wanted to do the EAV table design initially, until I was "suggested" the 3000 fields design. This "suggestion" was done before I realized that with only two reports I am up to approximately 60 fields.
First of all, if the table will really have 3,000 columns, then the choice of having such a wide table simply vanishes because SQL Server can only have something like 1024 columns per table and something like only 600 columns (IIRC and they have improved on that in recent editions) can be replicated it replication is ever needed.
The question to ask is, if you could make a 3000 column table, how many of them would be null for any given fruit.
It's not going to be easy but if you wanted proper normalized data (which is the way to go for everything except for a final reporting table), then you'd need to classify the 3000 columns and group related columns (usually when 1 is filled out, others are also filled out) into well formed tables along with the necessary FK's. Doing such a thing certainly isn't easy upfront and that's why a lot of people make the mistake of not doing it up front and settling on an EAV or a very wide table.
Still, there are many things in life where it appears that single rows in a single table are the way to go... call detail records are one of those things, for example. But those types of things always have data for all columns... there are no nulls in any of the rows.
I guess that's what will ultimately make the decision for you... for any given item, how many columns will contain nulls and what are the pattern's of the nulls? Obviously, that's not a question I need an answer to... it's a question to guide you in your decision making process.
As a side bar, you can overcome the number of columns per table limit by having parallel sister tables with common ID's. Obviously, one of the tables would be the "master" that generates the shared row ID's. Of course, EAV's are "easy" and through the use of cross-tabs or pivots, can generate many different reports quite easily but it's normally at a pretty big price when it comes to performance. That's why people don't make databases by putting everything all into one huge EAV.
The bottom line is "It depends"... my inclination would be to do it right and find someway to normalize the data... but if the data will always have non-null entries for all 3000 columns, a set of sister tables may be the way to go.
--Jeff Moden
Change is inevitable... Change for the better is not.
Since the data will be stored for posterity, I'm going to suggest a full, normalised database design, tables that store information about single 'things', foreign key relationships, the works. Yes, it's more work up front, but it'll save time when reports get changed, created, not needed any longer, etc.
Don't think about the reports when designing the tables, think about the data that you're storing. Data can be manipulated for the reports by stored proc or view. That's the easy part. Designing the tables is the hard part.
If I was doing a design for your system, I'd start (but by no means guarantee to finish with) the following 'entities'
Container (maybe)
Pallet (maybe)
Box
Sample
InspectionTypes
InspectionParameters
InspectionResults (possibly more than one table)
Plus any other entities that you need to store data on. Then go through the levels of normalisation, up to Boyce-Codd and split tables are necessary. Yes, it's going to be a lot more tables than you had, but it's going to allow all sorts of integrity checks that you don't have now. (what would a value for 'mold' of 11am really mean?) and it'll make changes to reports far easier than with either the all-in-one-row or all-in-one=column' designs.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 13 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply