Splitting a huge, a huge table in two or three

Question

Post reply

Splitting a huge, a huge table in two or three

sql-lover

SSCoach

Points: 18530
More actions
August 18, 2012 at 10:23 am

#280367

Ok,
I've seen big tables and databases before, but this one is really big. I just recently discovered that the reason for this is because customer wants to keep data over 2 years on it.
Anyway, got some ideas and would like to get input about those and how I would translate to physical and actual implementation.
Idea #1
Splitting the table in two. Keeping table #1 on the actual production database server but with data for current year. Then, based on the PK or date time column, move everything that is older to a different table and MS-SQL server , kind of data archiving stuff. Once that's done, replace any table reference on existing store procedure with a view, that points to the two physical tables.
Idea #2.
This one may improve performance, but I guess keeping the information on the same MS-SQL instance. This one is using partitions. Create two more file groups, and divide the workload and storage accordingly. What worries me about this one, is that may incur in downtime, especially if I need to drop and recreate the existing cluster Index when configuring the partition.
I did partitioning before, so I think I got a brief idea of how to implement #2. But what about #1? I will appreciate input about these two, or suggestions for different strategies.
In a nutshell, my main goal, regardless of the approach, is make that huge table smaller or divide the workload.
Thanks in advance.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 1

how big is "big" ??

row numbers

columns numbers

file size

index size

??

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

sql-lover SSCoach Points: 18530 More actions · Answer 2

Hmmm...

I am not working today, but several GBs ... over of 10 millions rows, off top of my head.

I do not remember the numbers exactly, but it is huge! I already validated this and there I did not have the approval to delete historical data, so pretty much those two are my only options.

The historical data MUST be available via T-SQL query all the time.

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 3

sql-lover (8/18/2012)
Hmmm...
I am not working today, but several GBs ... over of 10 millions rows, off top of my head.
I do not remember the numbers exactly, but it is huge! I already validated this and there I did not have the approval to delete historical data, so pretty much those two are my only options.
The historical data MUST be available via T-SQL query all the time.

hmm...10 million rows is not that big....I have tables of that size on my laptop.

In a nutshell, my main goal, regardless of the approach, is make that huge table smaller or divide the workload.

what are you trying to achieve by reducing table size...?

are you experiencing poor performance in your application ?

perhaps if you explain the reasons behind you request....someone will chime in.

good luck

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

sql-lover SSCoach Points: 18530 More actions · Answer 4

-Performance

-Disk's space (the whole db is over 1TB now)

Reducing the table will allow me to save lot of space and RAM. The reports that take almost an hour, will run faster.

Honestly, the reason for doing this is clear to me 🙂 ... I just want help about how to physically apply those, specially splitting the table and creating a view for it.

Any suggestions in how to implement what I explained?

Thanks.

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 5

sql-lover (8/18/2012)
-Performance
-Disk's space (the whole db is over 1TB now)
Reducing the table will allow me to save lot of space and RAM. The reports that take almost an hour, will run faster.
Honestly, the reason for doing this is clear to me 🙂 ... I just want help about how to physically apply those, specially splitting the table and creating a view for it.
Any suggestions in how to implement what I explained?
Thanks.

thanks for clarifying...appreciated.

I am not the person to advise...but I expect other forum members with the relevant experience will come along soon to assist.

good luck....

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

MysteryJimbo SSC-Insane Points: 24203 More actions · Answer 6

sql-lover (8/18/2012)
Ok,
Idea #1
Splitting the table in two. Keeping table #1 on the actual production database server but with data for current year. Then, based on the PK or date time column, move everything that is older to a different table and MS-SQL server , kind of data archiving stuff. Once that's done, replace any table reference on existing store procedure with a view, that points to the two physical tables.

Not ideal but could work. This could be done in the way you used to partition pre 2005. Research partitioned views

sql-lover (8/18/2012)
Ok,
Idea #2.
This one may improve performance, but I guess keeping the information on the same MS-SQL instance. This one is using partitions. Create two more file groups, and divide the workload and storage accordingly. What worries me about this one, is that may incur in downtime, especially if I need to drop and recreate the existing cluster Index when configuring the partition.

Partitioning is not really a solution to performance, there are performance problems with aggregations etc. If they clarify they want x years of data then you should look this and create a sliding window process to switch old partitions to an archive table (or abandon them all together).

Personally, based on the information given and assuming there is indefinite data retention, I would create a single file group with multiple equally sized files on different LUN's. SQL could share the IO over multiple threads and increase IO throughput to a degree.

If your customer intends to keep a lot of data then you should really be looking at carefully selected indexes on the table.

sql-lover SSCoach Points: 18530 More actions · Answer 7

MysteryJimbo (8/19/2012)
Partitioning is not really a solution to performance, there are performance problems with aggregations etc. If they clarify they want x years of data then you should look this and create a sliding window process to switch old partitions to an archive table (or abandon them all together).

Why?

It is my understanding that doing partitions, the T-SQL query may skip some data set altogether, so the query won't scan the whole table.

MysteryJimbo (8/19/2012)
Personally, based on the information given and assuming there is indefinite data retention, I would create a single file group with multiple equally sized files on different LUN's. SQL could share the IO over multiple threads and increase IO throughput to a degree.
If your customer intends to keep a lot of data then you should really be looking at carefully selected indexes on the table.

That is correct. There is indefinite data retention. And while I am trying to change this business decision, it is out of my control and not a quick or short term solution.

I cannot separate or split workload on same server, because I have no more LUNs, and I cannot redo the disks, this is a "no no". While ideal, it's not possible at this time. My only choice is migrate or move chunks of data, same table, to a different server that resides on same network and datacenter. Hence my idea of a view and moving part of the table outside of the existing server.

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 8

sql-lover (8/20/2012)
MysteryJimbo (8/19/2012)
Partitioning is not really a solution to performance, there are performance problems with aggregations etc. If they clarify they want x years of data then you should look this and create a sliding window process to switch old partitions to an archive table (or abandon them all together).
Why?
It is my understanding that doing partitions, the T-SQL query may skip some data set altogether, so the query won't scan the whole table.

worth a read....http://www.simple-talk.com/sql/database-administration/gail-shaws-sql-server-howlers/

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

sql-lover SSCoach Points: 18530 More actions · Answer 9

J Livingston SQL (8/20/2012)
sql-lover (8/20/2012)
MysteryJimbo (8/19/2012)
Partitioning is not really a solution to performance, there are performance problems with aggregations etc. If they clarify they want x years of data then you should look this and create a sliding window process to switch old partitions to an archive table (or abandon them all together).
Why?
It is my understanding that doing partitions, the T-SQL query may skip some data set altogether, so the query won't scan the whole table.
worth a read....http://www.simple-talk.com/sql/database-administration/gail-shaws-sql-server-howlers/

Thanks!

Really interesting, as most, if not all, Gail's articles. 😉 But I guess does not answer my question or it's a little bit off topic.

What I really want to know if which one is a best approach for my particular problem. Based on the comments so far, it looks like moving some old data, may be the best option.

Also, I think I missed an important point. I cannot optimize this more with Indexes. This is a data warehouse where some queries are "ad hoc" queries. I can't be more specific because some queries are created "on the fly".

I just need to divide or split the table, while keeping existing store procedures logic intact.

sql-lover SSCoach Points: 18530 More actions · Answer 10

By the way,

The table in question has a little bit more than 20 million rows. It's 2.7GB with 4GB in Indexes.

🙂

John_Hill Ten Centuries Points: 1173 More actions · Answer 11

To answer the immediate question, I would recommend partitioning. We have data warehouses for several clients, and this solution works great for us.

In my opinion, the bigger question is why is 20 million records consuming 2.7 GB? Might want to review the data types of the columns used and/or maybe consider using compression for you indexes.

sql-lover SSCoach Points: 18530 More actions · Answer 12

john 50727 (8/20/2012)
To answer the immediate question, I would recommend partitioning. We have data warehouses for several clients, and this solution works great for us.
In my opinion, the bigger question is why is 20 million records consuming 2.7 GB? Might want to review the data types of the columns used and/or maybe consider using compression for you indexes.

Thanks, I'll think about it.

The main reason for a table this big, it's normalization. But like I said before, short term, normalizing or dividing workload by LUNs is not possible. I fully understand I may not be addressing the root cause, but it is, what it is.