Partition an existing SQL Server Table

Question

Partition an existing SQL Server Table

Welsh Corgi

SSC Guru

Points: 116520
More actions
June 22, 2015 at 10:30 am

#301787

I have some table that need to be partitioned and archive one of the partitions.
I did this in Oracle several years ago but not in SQL Server.
I'm looking for a basic example on how to do this.
I know the basic steps but the examples that I found on the Web were not quite what I'm looking for.
Does anyone have a good article?
[/
Partition an existing SQL Server Table
url]
For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply

dbafromthecold@gmail.com SSChampion Points: 10160 More actions · Answer 1

You'd need to create the required partition schemes/functions and then rebuild your tables' clustered index onto the partition scheme.

Depending on how your table is built you would also have to redefine your clustered index and primary key (to use the column that you want to partition on).

I wrote an introduction to partitioning here:- http://www.sqlservercentral.com/blogs/the-dba-who-came-in-from-the-cold/2014/06/04/partitioning-basics-part-1/

(Sorry for the shameless self promotion).

If you post the tables' DDL statement and the partition scheme + function I can help you implement.

Be warned though, this is not an online operation

Welsh Corgi SSC Guru Points: 116520 More actions · Answer 2

DBA From The Cold (6/22/2015)
You'd need to create the required partition schemes/functions and then rebuild your tables' clustered index onto the partition scheme.
Depending on how your table is built you would also have to redefine your clustered index and primary key (to use the column that you want to partition on).
I wrote an introduction to partitioning here:- http://www.sqlservercentral.com/blogs/the-dba-who-came-in-from-the-cold/2014/06/04/partitioning-basics-part-1/
(Sorry for the shameless self promotion).
If you post the tables' DDL statement and the partition scheme + function I can help you implement.
Be warned though, this is not an online operation

Thanks!

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Jeff Moden SSC Guru Points: 1003851 More actions · Answer 3

Welsh Corgi (6/22/2015)
DBA From The Cold (6/22/2015)
You'd need to create the required partition schemes/functions and then rebuild your tables' clustered index onto the partition scheme.
Depending on how your table is built you would also have to redefine your clustered index and primary key (to use the column that you want to partition on).
I wrote an introduction to partitioning here:- http://www.sqlservercentral.com/blogs/the-dba-who-came-in-from-the-cold/2014/06/04/partitioning-basics-part-1/
(Sorry for the shameless self promotion).
If you post the tables' DDL statement and the partition scheme + function I can help you implement.
Be warned though, this is not an online operation
Thanks!

The first very important question is, which edition do you have? Standard or Enterprise?

The second very important question is what column will you partition by?

After that come some other important questions.

Is the partitioning column a DateTime or ???

Is the table Insert only on the current date or are updates ever allowed? If so, how far back and on which columns?

Why do you want to partition? If it's for reasons of query performance, you'll likely be disappointed.

And, yes... if you have the diskspace, converting an existing table to a partitioned table can certainly be an online evolution with less than just several seconds of blocking in total.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Welsh Corgi SSC Guru Points: 116520 More actions · Answer 4

Jeff Moden (6/22/2015)
Welsh Corgi (6/22/2015)
DBA From The Cold (6/22/2015)
You'd need to create the required partition schemes/functions and then rebuild your tables' clustered index onto the partition scheme.
Depending on how your table is built you would also have to redefine your clustered index and primary key (to use the column that you want to partition on).
I wrote an introduction to partitioning here:- http://www.sqlservercentral.com/blogs/the-dba-who-came-in-from-the-cold/2014/06/04/partitioning-basics-part-1/
(Sorry for the shameless self promotion).
If you post the tables' DDL statement and the partition scheme + function I can help you implement.
Be warned though, this is not an online operation
Thanks!
The first very important question is, which edition do you have? Standard or Enterprise?
The second very important question is what column will you partition by?
After that come some other important questions.
Is the partitioning column a DateTime or ???
Is the table Insert only on the current date or are updates ever allowed? If so, how far back and on which columns?
Why do you want to partition? If it's for reasons of query performance, you'll likely be disappointed.
And, yes... if you have the diskspace, converting an existing table to a partitioned table can certainly be an online evolution with less than just several seconds of blocking in total.

It is moving to a 2014 Box with the Enterprise Edition.

I was going to partition by Creation Date.

Inserts only, no updates.

They want me to partition because the table is so large. I had suggested archiving and deleting the records but it was mention that the log file would get too large.

I mentioned that if you use the TOP operator, a While loop and commit the records you would not have the issue of the Transaction log growing. I wrote code to do this several years ago but I don't have it any more.

Thanks Jeff.

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Welsh Corgi SSC Guru Points: 116520 More actions · Answer 5

Here is the data in that CreateDate Column:

CreationDateYearRecordCount

NULL4439

19301

19401

19903

19975880

199863068

199985365

200082985

2001151719

2002255011

2003386974

2004408519

2005525129

2006638583

2007852965

20081262168

20092022856

20102974695

20113285952

20123631320

20134838442

20143982934

2015210

20162

20171

20181

20191

20204

20291

20311

20502

20771

20881

21022

21037

21042

21122

22011

23041

25552

29121

29131

30135

30141

30231

50081

50111

62011

70131

72011

82011

92011

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Welsh Corgi SSC Guru Points: 116520 More actions · Answer 6

Currently there are 3 File Groups. Primary, FG1 and FG2.

FG1 is the default.

What impact does have on creating file groups for the partition, if any?

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Jeff Moden SSC Guru Points: 1003851 More actions · Answer 7

To be absolutely honest, the table has only about 25 million rows in it and, unless it's a very wide table or unless it has some massive data stored in MAX datatypes, it seems like it would be much more of a pain to go through partitioning than it would to do the looping deletes that you speak of. Most of the years have less than a million rows that could easily be deleted by month, one per night, that would prevent any type of logfile blowout. Even your biggest year's worth of data contains less than 5 million rows and, if you divide that by 12 months, that's only 416,000 rows per month, give or take any "seasonal" adjustments.

This table certainly could be partitioned but is management sure they want to go through such a thing instead of the partial deletes over time that you so correctly cited? They should also be made aware that partitioning is not a panacea of performance for most queries.

Could you run sp_SpaceUsed on the table and post the results so that I can get a better idea of the space used by the table and its indexes?

Also, it would appear that 2015, which is half over, only has 210 rows in it? Is that, by any chance, a mistake or is it due to a fiscal year offset?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

dbafromthecold@gmail.com SSChampion Points: 10160 More actions · Answer 8

Jeff Moden (6/22/2015)
Welsh Corgi (6/22/2015)
DBA From The Cold (6/22/2015)
You'd need to create the required partition schemes/functions and then rebuild your tables' clustered index onto the partition scheme.
Depending on how your table is built you would also have to redefine your clustered index and primary key (to use the column that you want to partition on).
I wrote an introduction to partitioning here:- http://www.sqlservercentral.com/blogs/the-dba-who-came-in-from-the-cold/2014/06/04/partitioning-basics-part-1/
(Sorry for the shameless self promotion).
If you post the tables' DDL statement and the partition scheme + function I can help you implement.
Be warned though, this is not an online operation
Thanks!
The first very important question is, which edition do you have? Standard or Enterprise?
The second very important question is what column will you partition by?
After that come some other important questions.
Is the partitioning column a DateTime or ???
Is the table Insert only on the current date or are updates ever allowed? If so, how far back and on which columns?
Why do you want to partition? If it's for reasons of query performance, you'll likely be disappointed.
And, yes... if you have the diskspace, converting an existing table to a partitioned table can certainly be an online evolution with less than just several seconds of blocking in total.

Just picking up on something that Jeff said, partitioning is not a performance tool it is for data management. If your colleagues think that by implementing partitioning on the table their query performance will improve then (again as Jeff says) 99 times out of 100, they'll be disappointed.

Welsh Corgi SSC Guru Points: 116520 More actions · Answer 9

Welsh Corgi (6/22/2015)
Here is the data in that CreateDate Column:
CreationDateYearRecordCount
NULL4439
19301
19401
19903
19975880
199863068
199985365
200082985
2001151719
2002255011
2003386974
2004408519
2005525129
2006638583
2007852965
20081262168
20092022856
20102974695
20113285952
20123631320
20134838442
20143982934
2015210
20162
20171
20181
20191
20204
20291
20311
20502
20771
20881
21022
21037
21042
21122
22011
23041
25552
29121
29131
30135
30141
30231
50081
50111
62011
70131
72011
82011
92011

They do not want it for performance reasons. They want it to be a way of archiving data.

With the Data being out of wack I was thinking that I need to update those date fields to a valid Date.

I'm not sure about the Partition Function syntax. Do I do a RANGE RIGHT so that I can archive the old data?

I'm partitioning on the Creation Date Column.

[/

CREATE TABLE [dbo].[TaskNote](

[TaskNoteId] [int] NOT NULL,

[ActiveTaskId] [int] NULL,

[TaskStateId] [int] NULL,

[CreationDate] [datetime] NULL,

[ResourceId] [int] NULL,

[CallId] [int] NULL,

[Memo] [text] NULL,

[sentToXactFlg] [bit] NULL,

CONSTRAINT [PK_TaskNote] PRIMARY KEY CLUSTERED

(

[TaskNoteId] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [FG1]

) ON [FG1] TEXTIMAGE_ON [PRIMARY]

GO

ALTER TABLE [dbo].[TaskNote] ADD CONSTRAINT [DF_TaskNote_sentToXactFlg] DEFAULT (0) FOR [sentToXactFlg]

GO

code]

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

dbafromthecold@gmail.com SSChampion Points: 10160 More actions · Answer 10

OK so I've run through this on my dev instance.

DISCLAIMER - Please don't run this without testing it first (I know you won't but just need to say it 🙂 )

So first create the partition function and scheme:-

--Create partition function

DECLARE @CurrentDate DATETIME = DATEADD(dd, DATEDIFF(dd, 0, GETUTCDATE()), 0);

CREATE PARTITION FUNCTION PF_TaskNote (DATETIME)

AS RANGE RIGHT

FOR VALUES (@CurrentDate+7,@CurrentDate+6,@CurrentDate+5,@CurrentDate+4,

@CurrentDate+3,@CurrentDate+2,@CurrentDate+1,@CurrentDate,

@CurrentDate-1,@CurrentDate-2,@CurrentDate-3,@CurrentDate-4,

@CurrentDate-5,@CurrentDate-6,@CurrentDate-7,@CurrentDate-8);

--Create partition scheme

CREATE PARTITION SCHEME PS_TaskNote

AS PARTITION PF_TaskNote

ALL TO (FG1);

For ease I created each partition on the FG1 filegroup but you can change as needed. I've used the RANGE RIGHT as that the above script will set the partition boundaries as:-

x < 2015-06-15 00:00:00

2015-06-16 00:00:00 <= x < 2015-06-17 00:00:00

2015-06-17 00:00:00 <= x < 2015-06-18 00:00:00

So that makes each partition one full day.

Then I set the CreationDate column to not null (you'll need to check that this is OK):-

ALTER TABLE [dbo].[TaskNote] ALTER COLUMN [CreationDate] [DATETIME] NOT NULL;

Now I'm going to drop the existing PK and re-create as nonclustered (this is why it's not an online operation so be careful!):-

--Drop existing PK

ALTER TABLE [dbo].[TaskNote] DROP CONSTRAINT [PK_TaskNote];

GO

--Recreate PK as non clustered index

ALTER TABLE [dbo].[TaskNote] ADD CONSTRAINT [PK_TaskNote] PRIMARY KEY NONCLUSTERED

(

[TaskNoteId],[CreationDate] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,

IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON PS_TaskNote(CreationDate);

GO

I'm assuming that you want to keep the existing PK. I've had to add the CreationDate column as the partitioning key must be part of each unique constraint on the column.

Finally I'm going to add a clustered index to the table on CreationDate:-

--Create clustered index on CreationDate column

CREATE CLUSTERED INDEX IX_TaskNote_CreationDate ON [dbo].[TaskNote]

(

CreationDate ASC

) ON PS_TaskNote(CreationDate);

You can check the partitions then by running:-

SELECT

t.name AS TableName, i.name AS IndexName, p.partition_number, p.partition_id,

r.boundary_id, r.value AS BoundaryValue, p.rows

FROM

sys.tables AS t

INNER JOIN

sys.indexes AS i ON t.object_id = i.object_id

INNER JOIN

sys.partitions AS p ON i.object_id = p.object_id AND i.index_id = p.index_id

INNER JOIN

sys.allocation_units a ON a.container_id = p.hobt_id

INNER JOIN

sys.filegroups fg ON fg.data_space_id = a.data_space_id

INNER JOIN

sys.partition_schemes AS s ON i.data_space_id = s.data_space_id

INNER JOIN

sys.partition_functions AS f ON s.function_id = f.function_id

LEFT OUTER JOIN

sys.partition_range_values AS r ON f.function_id = r.function_id

AND r.boundary_id = p.partition_number

WHERE

t.name = 'TaskNote'

AND

i.type <= 1

AND

a.type = 1 --in row data only

So run through that and let me know if you have any questions. If anyone on here has a better way of doing it, please let me know 🙂

Jeff Moden SSC Guru Points: 1003851 More actions · Answer 11

I'd do RANGE RIGHT because it seems to be the method that humans understand where the partition date is the first date of the partition rather than the last.

I have a presentation with a full up working code example and a very handy view that tells you everything about all partitions in a database. I'd also make one file per filegroup and one filegroup per month to greatly decrease the backup requirements. I dig that out of my archives and attach all that this morning.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1003851 More actions · Answer 12

Actually, DBA from the Cold provided a pretty good summary for you to easily follow. I'm not begging off but my presentation material is a whole lot more complicated because I also wrote a fairly complicated bit of code to do file compaction of free space during the final index rebuilds before making a partition Read_Only for backup consolidation. The code is the result of me partitioning a 350GB table across 70 months and prevented about 70GB of wasted space and I had to do it all with virtually no downtime. I strongly suspect that you won't have such a problem.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Welsh Corgi SSC Guru Points: 116520 More actions · Answer 13

Thanks!:-)

I did not get the example on the Date Ranges?

Did you see where I posted the records with GROUP BY Year and a record count for each year.

If I create the ranges by month, I will have a lot of partitions, I believe 192. If I do it by Year I will have 16.

Thanks again.

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Jeff Moden SSC Guru Points: 1003851 More actions · Answer 14

Yep... Saw that. You'd be better off partitioning by year.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)