Putting all your data in one column of one table

  • michael.welcome (5/9/2008)


    Well I could post the company name, embarasing them in a public forum could be grounds for firing, so I will not. I have payments on my 2003 Red Corvette Z06 (Really I do have one) that need to be made. So far we are about 18 months and millions of dollars into this project and nothing works. I figure my Vet will be payed off by the time this project ends.

    That is pretty much what I expected, but wondered if anyone else had a different experience with this.

    You might want to start looking for a new job now. It wouldn't be a big surprise for management to get fed up with the cost, cancel the project, and lay off everyone involved.

    Actually, that may be the company's best possible outcome. An even bigger disaster would be if it somehow was released into production and they had to use it for a critical production system.

  • While I don't agree with the approach, and I have created survey apps that did not stuff everything in one column, there is the potential to store your data as XML. You can build indexes for nodes in the XML, be prepared for your database to bloat very large though.

    You are better off to design the tables correctly, but wanted to at least give some other ideas to consider and I have also used in the past. The XML indexing made the database grow 6x it original size just because of the indexes, but worked for what the desire was, which was storing OneNote documents in a SQL Database.

    Good Luck.

  • I've seen this a number of times, over on www.thedailywtf.com 😀

    Get out of that company while you still can. That design is one of the things that sounds nice in theory, but falls over spectacuarly in reality.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Instead of bailing from the company, you have a huge opportunity... set up a multi-million row test and show them why it's a bad idea. Challenge those that support the single table theory to write anything performant. Be prepared to show a multi-table example that matches.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Just read this thread! 😀

    Its a fantastic idea! (as in the realms of fantasy) 😛

    Some people never learn....we always propose this design first off as a laugh just to see if the project managers are paying attention.

    --Shaun

    Hiding under a desk from SSIS Implemenation Work :crazy:

  • Jeff Moden (5/12/2008)


    Instead of bailing from the company, you have a huge opportunity... set up a multi-million row test and show them why it's a bad idea. Challenge those that support the single table theory to write anything performant. Be prepared to show a multi-table example that matches.

    Keep in mind - Jeff's talking about a "real" 1M row test (i.e. 1M distinct values in the entity column), so 1M x 40 columns = 40M rows.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • I am generally against this kind of design. (OK I think its stupid but I am being nice). SQL server, as well as most other RDBS, are designed to work more efficiently when the data conforms to certain design principles. This is not one of them.

    If they really really really want semi-structured data, maybe you should try to get them to use XML. At least it has some minimal support.

  • Here is the schema full

    |-----------------|

    -------> | |

    | | STUFF |

    | | |

    ---------| |

    |----------------|

  • My, my, my. E.F. Codd is rolling in his grave now -- laughing or not, I wouldn't know.

    I've seen small examples of this sort of thing before and they usually result from someone deciding that requirements definition and analysis just take too much time or effort. They see a generic catch-all column with a what-is-this indicator as a clever shortcut. As others have pointed out, this approach is not the panacea it may at first appear to be, often leading to difficulties when the business needs change later.

    I suppose it's only a short stroll down the road to hell to say "why not just do that for everything?". As an applications developer, I've had to learn something about database design but more importantly have learned to depend on trained and skilled data modelers and database designers. Whoever's running this mystery company's IT shop has apparently never learned that basic lesson.

  • Why?

    That's the question I would like to ask the guy who, um, designed this and also the guy who accepted it. The first one has to be the second one's son or girlfriend.

    What advantages do you think you are getting from it? What problems does this solve? Why not just store this data in a text file? Why use an RDBMS at all if you're just going to short-circuit every feature it provides you?

    I enjoy challenges. I must have a challenging work environment or I go batty. I'm sure I'm in good company here. But one challenge I really have little tolerance with is "making stupidity work." There are simply too many fun ways to make a very good living to put up with that.

    Tomm Carr
    --
    Version Normal Form -- http://groups.google.com/group/vrdbms

  • Matt Miller (5/12/2008)


    Jeff Moden (5/12/2008)


    Instead of bailing from the company, you have a huge opportunity... set up a multi-million row test and show them why it's a bad idea. Challenge those that support the single table theory to write anything performant. Be prepared to show a multi-table example that matches.

    Keep in mind - Jeff's talking about a "real" 1M row test (i.e. 1M distinct values in the entity column), so 1M x 40 columns = 40M rows.

    Exactly!!!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • 1 million entities with 40 columns - 6 of the columns are address information and half of the entities have two addresses - 1 column is a phone number and everyone has three phone numbers. So 48 million rows.

    Here's a thought - ask the developer who has suggested this plan just to fill the table with this type of test data.

    Then, let's think relationally and say that some of the addresses are shared with more than one entity - oops, probably just broke the design.

  • Mr Welcome,

    Quick Question about the data;

    1. Is it as others have 'assumed' that the data for a particular entity will be split into rows - 1 for each attribute?

    or

    2. Each entity in the model will occupy one row with all of its attributes concatenated into the varchar field as '|AttributeName1|AttributeValue1|......|AttributeNameN|AttributeValueN|'?

    (| is used as a delimiter).

    if case 2 is used a million rows only would be the required dataset size for a million entities.

    Is the design this 'clever'? (still a bad idea - gonna needs lots of code and will need big cost to amend).

    --Shaun

    Hiding under a desk from SSIS Implemenation Work :crazy:

  • Good point Shaun - there are worse solutions like the one you are talking about.

    The original description did not sound like the plan was to use multi-value delimited fields. Doing this would only compound the problem since everything would require parsing, record modifications would always require code changes, and any search would always result in a table scan (unless you happen to be searching for the first field in the delimited string).

  • The idea is of this design would have one row for each data item. If you are looking up an item that is related to a particular entity (one customer for example), then there are indexes that can be used to retrieve the 2000-3000 rows of data that are related to that entity. If retrieving items for one entity related to one screen, only 40-50 items might be retrieved. While not optimal, the online performance should tolerable.

    The problem would seem to be more with import/export programs and reporting. Many of these queries will need to look at the data based on the type of item. For example I want to see all the customers in a state, that signed an agreement in 2007. That will require massive table scans or at the very least book mark lookups on large numbers of rows. Knowing the type of item you are looking for might mean that you need to look at less than 1% of the rows, but with 40 million rows that would still be hundreds of thousands of book mark lookups. At some point a table scan is faster.

    Then to do any camparisions to find items that are within a range will require converting the varchar variable to a native type (Datetime, numeric, or int). There is no way to index any of this data by say zip code or state, since all these values are stored in the same heap with everything else.

    Michael

Viewing 15 posts - 16 through 30 (of 54 total)

You must be logged in to reply to this topic. Login to reply