Putting all your data in one column of one table

Question

Putting all your data in one column of one table

Viewing 15 posts - 16 through 30 (of 54 total)

You must be logged in to reply to this topic. Login to reply

Michael Valentine Jones SSC Guru Points: 64818 More actions · Answer 1

michael.welcome (5/9/2008)
Well I could post the company name, embarasing them in a public forum could be grounds for firing, so I will not. I have payments on my 2003 Red Corvette Z06 (Really I do have one) that need to be made. So far we are about 18 months and millions of dollars into this project and nothing works. I figure my Vet will be payed off by the time this project ends.
That is pretty much what I expected, but wondered if anyone else had a different experience with this.

You might want to start looking for a new job now. It wouldn't be a big surprise for management to get fed up with the cost, cancel the project, and lay off everyone involved.

Actually, that may be the company's best possible outcome. An even bigger disaster would be if it somehow was released into production and they had to use it for a critical production system.

awilbournsqlcentral Old Hand Points: 363 More actions · Answer 2

While I don't agree with the approach, and I have created survey apps that did not stuff everything in one column, there is the potential to store your data as XML. You can build indexes for nodes in the XML, be prepared for your database to bloat very large though.

You are better off to design the tables correctly, but wanted to at least give some other ideas to consider and I have also used in the past. The XML indexing made the database grow 6x it original size just because of the indexes, but worked for what the desire was, which was storing OneNote documents in a SQL Database.

Good Luck.

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 3

I've seen this a number of times, over on www.thedailywtf.com 😀

Get out of that company while you still can. That design is one of the things that sounds nice in theory, but falls over spectacuarly in reality.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Jeff Moden SSC Guru Points: 1003865 More actions · Answer 4

Instead of bailing from the company, you have a huge opportunity... set up a multi-million row test and show them why it's a bad idea. Challenge those that support the single table theory to write anything performant. Be prepared to show a multi-table example that matches.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Shaun McGuile SSCarpal Tunnel Points: 4111 More actions · Answer 5

Just read this thread! 😀

Its a fantastic idea! (as in the realms of fantasy) 😛

Some people never learn....we always propose this design first off as a laugh just to see if the project managers are paying attention.

--Shaun

Hiding under a desk from SSIS Implemenation Work :crazy:

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 6

Jeff Moden (5/12/2008)
Instead of bailing from the company, you have a huge opportunity... set up a multi-million row test and show them why it's a bad idea. Challenge those that support the single table theory to write anything performant. Be prepared to show a multi-table example that matches.

Keep in mind - Jeff's talking about a "real" 1M row test (i.e. 1M distinct values in the entity column), so 1M x 40 columns = 40M rows.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

v r Newbie Points: 1 More actions · Answer 7

I am generally against this kind of design. (OK I think its stupid but I am being nice). SQL server, as well as most other RDBS, are designed to work more efficiently when the data conforms to certain design principles. This is not one of them.

If they really really really want semi-structured data, maybe you should try to get them to use XML. At least it has some minimal support.

w.michael.stone SSC Enthusiast Points: 103 More actions · Answer 8

Here is the schema full

|-----------------|

-------> | |

| | STUFF |

| | |

---------| |

|----------------|

john.arnott SSChampion Points: 11882 More actions · Answer 9

My, my, my. E.F. Codd is rolling in his grave now -- laughing or not, I wouldn't know.

I've seen small examples of this sort of thing before and they usually result from someone deciding that requirements definition and analysis just take too much time or effort. They see a generic catch-all column with a what-is-this indicator as a clever shortcut. As others have pointed out, this approach is not the panacea it may at first appear to be, often leading to difficulties when the business needs change later.

I suppose it's only a short stroll down the road to hell to say "why not just do that for everything?". As an applications developer, I've had to learn something about database design but more importantly have learned to depend on trained and skilled data modelers and database designers. Whoever's running this mystery company's IT shop has apparently never learned that basic lesson.

Tomm Carr SSCertifiable Points: 5426 More actions · Answer 10

Why?

That's the question I would like to ask the guy who, um, designed this and also the guy who accepted it. The first one has to be the second one's son or girlfriend.

What advantages do you think you are getting from it? What problems does this solve? Why not just store this data in a text file? Why use an RDBMS at all if you're just going to short-circuit every feature it provides you?

I enjoy challenges. I must have a challenging work environment or I go batty. I'm sure I'm in good company here. But one challenge I really have little tolerance with is "making stupidity work." There are simply too many fun ways to make a very good living to put up with that.

Tomm Carr
--
Version Normal Form -- http://groups.google.com/group/vrdbms

Jeff Moden SSC Guru Points: 1003865 More actions · Answer 11

Matt Miller (5/12/2008)
Jeff Moden (5/12/2008)
Instead of bailing from the company, you have a huge opportunity... set up a multi-million row test and show them why it's a bad idea. Challenge those that support the single table theory to write anything performant. Be prepared to show a multi-table example that matches.
Keep in mind - Jeff's talking about a "real" 1M row test (i.e. 1M distinct values in the entity column), so 1M x 40 columns = 40M rows.

Exactly!!!

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Michael Earl-395764 SSC Guru Points: 53873 More actions · Answer 12

1 million entities with 40 columns - 6 of the columns are address information and half of the entities have two addresses - 1 column is a phone number and everyone has three phone numbers. So 48 million rows.

Here's a thought - ask the developer who has suggested this plan just to fill the table with this type of test data.

Then, let's think relationally and say that some of the addresses are shared with more than one entity - oops, probably just broke the design.

Shaun McGuile SSCarpal Tunnel Points: 4111 More actions · Answer 13

Mr Welcome,

Quick Question about the data;

1. Is it as others have 'assumed' that the data for a particular entity will be split into rows - 1 for each attribute?

or

(| is used as a delimiter).

if case 2 is used a million rows only would be the required dataset size for a million entities.

Is the design this 'clever'? (still a bad idea - gonna needs lots of code and will need big cost to amend).

--Shaun

Hiding under a desk from SSIS Implemenation Work :crazy:

Michael Earl-395764 SSC Guru Points: 53873 More actions · Answer 14

Good point Shaun - there are worse solutions like the one you are talking about.

The original description did not sound like the plan was to use multi-value delimited fields. Doing this would only compound the problem since everything would require parsing, record modifications would always require code changes, and any search would always result in a table scan (unless you happen to be searching for the first field in the delimited string).

michaelwelcome SSC-Addicted Points: 411 More actions · Answer 15

The idea is of this design would have one row for each data item. If you are looking up an item that is related to a particular entity (one customer for example), then there are indexes that can be used to retrieve the 2000-3000 rows of data that are related to that entity. If retrieving items for one entity related to one screen, only 40-50 items might be retrieved. While not optimal, the online performance should tolerable.

The problem would seem to be more with import/export programs and reporting. Many of these queries will need to look at the data based on the type of item. For example I want to see all the customers in a state, that signed an agreement in 2007. That will require massive table scans or at the very least book mark lookups on large numbers of rows. Knowing the type of item you are looking for might mean that you need to look at less than 1% of the rows, but with 40 million rows that would still be hundreds of thousands of book mark lookups. At some point a table scan is faster.

Then to do any camparisions to find items that are within a range will require converting the varchar variable to a native type (Datetime, numeric, or int). There is no way to index any of this data by say zip code or state, since all these values are stored in the same heap with everything else.

Michael