SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


When is too many columns too many columns?


When is too many columns too many columns?

Author
Message
Kevlarmpowered
Kevlarmpowered
SSC-Enthusiastic
SSC-Enthusiastic (191 reputation)SSC-Enthusiastic (191 reputation)SSC-Enthusiastic (191 reputation)SSC-Enthusiastic (191 reputation)SSC-Enthusiastic (191 reputation)SSC-Enthusiastic (191 reputation)SSC-Enthusiastic (191 reputation)SSC-Enthusiastic (191 reputation)

Group: General Forum Members
Points: 191 Visits: 163
I know you can have some insane number of columns in SQL server... but just because you could, doesn't mean you should. Right now I am battling with a table that has almost 900 columns. 34 of those columns are dedicated to integer values which define what user is defined to what part of the process. 280 columns are defined to statuses of the particular thing. It is actually 140 datetime values, but when it was built, it was built to split datetime into two columns, one for date and one for time.

Locking is the first thing that comes to mind as a high traffic table is going to lock the entire thing up if you are updating one or many of the columns...
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (505K reputation)SSC Guru (505K reputation)SSC Guru (505K reputation)SSC Guru (505K reputation)SSC Guru (505K reputation)SSC Guru (505K reputation)SSC Guru (505K reputation)SSC Guru (505K reputation)

Group: General Forum Members
Points: 505760 Visits: 44254
Kevlarmpowered - Friday, January 12, 2018 3:25 PM
I know you can have some insane number of columns in SQL server... but just because you could, doesn't mean you should. Right now I am battling with a table that has almost 900 columns. 34 of those columns are dedicated to integer values which define what user is defined to what part of the process. 280 columns are defined to statuses of the particular thing. It is actually 140 datetime values, but when it was built, it was built to split datetime into two columns, one for date and one for time.

Locking is the first thing that comes to mind as a high traffic table is going to lock the entire thing up if you are updating one or many of the columns...


There's actually nothing wrong with a 900 column table IF the rules of normalization were applied... which also usually means that you won't have a table with 900 columns. Wink

I've worked with tables with similar insane numbers of columns. Most of those had to deal with CDRs (Call Detail Records). For those, it made no sense to split the columns off into separate tables.

If you status/datetime columns have a lot of NULLs in them, it may be worthwhile to build a "sister" table in the form of an EAV to store those statuses. A whole lot of people will tell you that EAVs are the wrong way to go but this is one of those places where they can really shine if you put some thought into it.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Grant Fritchey
Grant Fritchey
SSC Guru
SSC Guru (221K reputation)SSC Guru (221K reputation)SSC Guru (221K reputation)SSC Guru (221K reputation)SSC Guru (221K reputation)SSC Guru (221K reputation)SSC Guru (221K reputation)SSC Guru (221K reputation)

Group: General Forum Members
Points: 221164 Visits: 33551
I'm with Jeff. The right number of columns is the number of columns needed to support the object stored inside the database, whether normalized or in a star schema, whatever. There isn't a number that's too many.

That said, it sounds like, based on your description, that you're not dealing with a properly structured table. 180 columns to define a user and 280 columns to define a thing... That can't possibly be normalized or a fact table. It sounds like multiple functions have been jammed into a single table. In that case, it's not that there are too many columns, but that the structure is wrong. Bad structures will absolutely cause performance headaches, maintenance problems and even development and testing problems. Just overloading a column causes problems, let alone overloading a table.

----------------------------------------------------
The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood...
Theodore Roosevelt

The Scary DBA
Author of: SQL Server Query Performance Tuning and SQL Server Execution Plans
Product Evangelist for Red Gate Software
Eirikur Eiriksson
Eirikur Eiriksson
SSC Guru
SSC Guru (93K reputation)SSC Guru (93K reputation)SSC Guru (93K reputation)SSC Guru (93K reputation)SSC Guru (93K reputation)SSC Guru (93K reputation)SSC Guru (93K reputation)SSC Guru (93K reputation)

Group: General Forum Members
Points: 93591 Visits: 20651
Grant Fritchey - Monday, January 15, 2018 7:33 AM
I'm with Jeff. The right number of columns is the number of columns needed to support the object stored inside the database, whether normalized or in a star schema, whatever. There isn't a number that's too many.

That said, it sounds like, based on your description, that you're not dealing with a properly structured table. 180 columns to define a user and 280 columns to define a thing... That can't possibly be normalized or a fact table. It sounds like multiple functions have been jammed into a single table. In that case, it's not that there are too many columns, but that the structure is wrong. Bad structures will absolutely cause performance headaches, maintenance problems and even development and testing problems. Just overloading a column causes problems, let alone overloading a table.

+100
Cool

There are exceptions to this though, i.e. CDRs, Surveys etc. In those cases I'd recommend looking into the sparse column features and limitations.

T.Ashish
T.Ashish
SSCrazy
SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)

Group: General Forum Members
Points: 2575 Visits: 639
900 column table must be having a lot of indexes and a complex primary key. What will happen to Update and Delete performance on such tables?
Grant Fritchey
Grant Fritchey
SSC Guru
SSC Guru (221K reputation)SSC Guru (221K reputation)SSC Guru (221K reputation)SSC Guru (221K reputation)SSC Guru (221K reputation)SSC Guru (221K reputation)SSC Guru (221K reputation)SSC Guru (221K reputation)

Group: General Forum Members
Points: 221164 Visits: 33551
T.Ashish - Monday, January 15, 2018 11:31 PM
900 column table must be having a lot of indexes and a complex primary key. What will happen to Update and Delete performance on such tables?


Lots of activity.

Seriously though, pretty much what you would expect. You have to write the data once to the store (heap or clustered) and then you have to do the appropriate writes to each applicable index. Any of those writes could lead to the addition of new pages or page splits, with a whole bunch more writes. You do that every single time.

I've said it before, I'll say it again, a properly normalized database enhances performance within an RDBMS. If you really only need a flat file, use a flat file storage system or a document database, not an RDBMS. They will work better with relational data than they work with non-relational data. They will work better with a star schema than with a flat file system. The whole "flat tables are faster" myth makes me insane. It generally comes from "flat tables are faster than my horrifically bad normalization/poor choice in fact table definition." Yes, writing mediocre code runs faster than writing bad code. Well done. What if we actually write good code?

----------------------------------------------------
The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood...
Theodore Roosevelt

The Scary DBA
Author of: SQL Server Query Performance Tuning and SQL Server Execution Plans
Product Evangelist for Red Gate Software
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (505K reputation)SSC Guru (505K reputation)SSC Guru (505K reputation)SSC Guru (505K reputation)SSC Guru (505K reputation)SSC Guru (505K reputation)SSC Guru (505K reputation)SSC Guru (505K reputation)

Group: General Forum Members
Points: 505760 Visits: 44254
Grant Fritchey - Tuesday, January 16, 2018 6:13 AM
If you really only need a flat file, use a flat file storage system or a document database, not an RDBMS.

I'll say, "It Depends". Such a thing is certainly better than storing large XML or JSON junk and may be entirely necessary because an RDBMS will still handle files (once imported into a table) better than a whole lot of other things especially if it is data such as CDRs or similar information.


Yes, writing mediocre code runs faster than writing bad code. Well done. What if we actually write good code?


+1000 to that!


--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (505K reputation)SSC Guru (505K reputation)SSC Guru (505K reputation)SSC Guru (505K reputation)SSC Guru (505K reputation)SSC Guru (505K reputation)SSC Guru (505K reputation)SSC Guru (505K reputation)

Group: General Forum Members
Points: 505760 Visits: 44254

Sorry... posted to wrong thread on this post and moved it.



--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Kevlarmpowered
Kevlarmpowered
SSC-Enthusiastic
SSC-Enthusiastic (191 reputation)SSC-Enthusiastic (191 reputation)SSC-Enthusiastic (191 reputation)SSC-Enthusiastic (191 reputation)SSC-Enthusiastic (191 reputation)SSC-Enthusiastic (191 reputation)SSC-Enthusiastic (191 reputation)SSC-Enthusiastic (191 reputation)

Group: General Forum Members
Points: 191 Visits: 163
Thanks all for your input... unfortunately, I didn't build these tables. They were built as data warehouse tables because they didn't want users writing SQL, so to make life easier, they made tables that had lots of columns so end users could basically select * from factTable where createDate >= dateadd(year,-2,getdate()). To avoid having to do "joins" they ETL everything into single tables for users too. So there is lots of repetition. This table likely has 900 columns of which a bunch are duplicated in other tables... but because they say "joins are bad" they ETL everything into one giant flat table. Sometimes it's values... sometimes it is just a flag 0 or 1.

My thought at doing this was to create a sister table with a 1:1 relationship with the original fact table so that I could offload the user assignment and date and time columns to a separate table, but that's still pushing 314 columns into a table. My next thought was to pivot those columns and have a table dedicated to statuses. So for every "thing" I would have a 1:many relationship. Each "thing" could have multiple users assigned (bonus it would allow multiple people with the same role to be assigned to a thing which can't be done now -- if multiple people are assigned, to a thing, the most recent assignment overwrites the existing assignment). As it stands, eventually all of those 34 user assignment columns will be populated and each of those 280 date and time columns will have values in them as well.

I was just trying to wrap my head around this because yes you can have 1000+ columns per table... but my brain says it's a bad idea because locking. As the activity on the table grows, each update statement is going to lock the entire row. If we are rapidly reassigning and tagging dates/times, that's going to be a problem with growth. Going verticle would increase the number of rows, but would lessen the row locking as things are inserted/updated.

For explanation purposes ... a "thing" is assigned to a user. A user can do many different steps which is why there are 34 user assignment columns and 240 date and time columns. One user could do 10-12 steps to a particular "thing".
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum








































































































































































SQLServerCentral


Search