SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Index alignment in partitioning


Index alignment in partitioning

Author
Message
Matthew Darwin
Matthew Darwin
SSCarpal Tunnel
SSCarpal Tunnel (4.9K reputation)SSCarpal Tunnel (4.9K reputation)SSCarpal Tunnel (4.9K reputation)SSCarpal Tunnel (4.9K reputation)SSCarpal Tunnel (4.9K reputation)SSCarpal Tunnel (4.9K reputation)SSCarpal Tunnel (4.9K reputation)SSCarpal Tunnel (4.9K reputation)

Group: General Forum Members
Points: 4935 Visits: 880
Good morning,

First up; I've yet to use partitioning in a production environment, and pretty much last ran any partitioning related code a few years back when looking at certification; so I'm definitely not an expert on the matter and only loosely clued up on the concepts.

I've recently started with a new employer, and they have just implemented a new system for sms messaging. The database tables tracking the sms messages being sent are going to get big and so they have created decided to implement partitioning on some of the tables using a partition scheme on the CreatedDate column; the DBA involved in designing the partitioning has left and I'm picking this up.

The relevant DDL for the table is below:-

CREATE TABLE [Message].[Sms](
[SmsId] [bigint] IDENTITY(250000001,1) NOT NULL,
[CreatedDate] [datetime] NOT NULL CONSTRAINT [DF_Sms:CreatedDate] DEFAULT (getdate()),
CONSTRAINT [PK_Sms:SmsId] PRIMARY KEY NONCLUSTERED
(
[SmsId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
)

CREATE CLUSTERED INDEX [IX:SMS:Clustered] ON [Message].[Sms]
(
[SmsId] ASC,
[CreatedDate] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90)
GO

CREATE NONCLUSTERED INDEX [IX:SMS:CreatedDate] ON [Message].[Sms]
(
[CreatedDate] ASC
)
INCLUDE ( [SmsId]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90)
GO



There are some issues with the above that I will be addressing seperately (e.g. the clustered index should be unique as it contians the unique key, and the fillfactors are daft), but my concerns for this post are below.

1) How to define the Primary Key and enforce it's uniqueness whilst trying to ensure it's aligned with the partition in order to be able to switch out old data once an as yet undefined retention period has passed. In books online it states:- "If it is not possible for the partitioning column to be included in the unique key, you must use a DML trigger instead to enforce uniqueness. " Books online - Special Guidelines for Partitioned Indexes. However, I'm not sure what this means, nor how I create the primary key to use the partition function seeing as it doesn't have the CreatedDate in the unique key?

2) The original partition function was envisaged as the following:-


CREATE PARTITION FUNCTION [DateFunction](datetime) AS RANGE
LEFT FOR VALUES (N'2014-01-01T00:00:00.000'
, N'2014-04-01T00:00:00.000'
, N'2014-07-01T00:00:00.000'
, N'2014-10-01T00:00:00.000'
, N'2015-01-01T00:00:00.000'
, N'2015-04-01T00:00:00.000'
, N'2015-04-02T00:00:00.000'
, N'2015-04-03T00:00:00.000'
, N'2015-04-04T00:00:00.000'
, N'2015-04-05T00:00:00.000')
GO



There is a procedure that has been created and scheduled daily that will create a new partition for each day, and then merge these together at the end of the quarter. My understanding of partitioning is that this is a bad idea, as it will result in merging several populated partitions together. Is my understanding correct? If so, I'm planning on removing the day partitions at the end of the function, and simply adding quarterly partitions, maintaining a spare empty partition at the end of the table. Would this make more sense?

As usual, thanks in advance for any comments/thoughts.

Regards

Matthew

Follow me on twitter @EvoDBACheck out my blog Natural Selection DBA
GilaMonster
GilaMonster
SSC Guru
SSC Guru (910K reputation)SSC Guru (910K reputation)SSC Guru (910K reputation)SSC Guru (910K reputation)SSC Guru (910K reputation)SSC Guru (910K reputation)SSC Guru (910K reputation)SSC Guru (910K reputation)

Group: General Forum Members
Points: 910158 Visits: 48846
Do you know why they implemented partitioning? What problem is it supposed to solve? Archiving/deleting of old data?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass


Matthew Darwin
Matthew Darwin
SSCarpal Tunnel
SSCarpal Tunnel (4.9K reputation)SSCarpal Tunnel (4.9K reputation)SSCarpal Tunnel (4.9K reputation)SSCarpal Tunnel (4.9K reputation)SSCarpal Tunnel (4.9K reputation)SSCarpal Tunnel (4.9K reputation)SSCarpal Tunnel (4.9K reputation)SSCarpal Tunnel (4.9K reputation)

Group: General Forum Members
Points: 4935 Visits: 880
Archiving and index maintenance are the two reasons I've been given, though I have an inkling the real reason might simply be wanting to try it out.

However, I suspect that it may well be a sensible candidate as the old table has some 700 million rows, most of which now is unneeded data.

Follow me on twitter @EvoDBACheck out my blog Natural Selection DBA
Steve Hall
Steve Hall
SSCoach
SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)

Group: General Forum Members
Points: 17004 Visits: 13242
Matthew Darwin (4/1/2015)
Archiving and index maintenance are the two reasons I've been given, though I have an inkling the real reason might simply be wanting to try it out.

However, I suspect that it may well be a sensible candidate as the old table has some 700 million rows, most of which now is unneeded data.



We use partition switching to keep data for a variety of time periods - from 3 days to 15 months, depending upon what we need it for.
You'd be better off having a partition scheme based on age, not the actual date, because you'll have to alter it as time progresses.
Have a look at an example of a three day retention setup,which can be changed to suit the retention you require - http://wp.me/p3Vxvi-6S

Changing a table with 700 million rows to cater for partitioning is going to be interesting.

Edited for typo.

Steve Hall
Linkedin
Blog Site
Matthew Darwin
Matthew Darwin
SSCarpal Tunnel
SSCarpal Tunnel (4.9K reputation)SSCarpal Tunnel (4.9K reputation)SSCarpal Tunnel (4.9K reputation)SSCarpal Tunnel (4.9K reputation)SSCarpal Tunnel (4.9K reputation)SSCarpal Tunnel (4.9K reputation)SSCarpal Tunnel (4.9K reputation)SSCarpal Tunnel (4.9K reputation)

Group: General Forum Members
Points: 4935 Visits: 880
BrainDonor (4/1/2015)
Matthew Darwin (4/1/2015)
Archiving and index maintenance are the two reasons I've been given, though I have an inkling the real reason might simply be wanting to try it out.

However, I suspect that it may well be a sensible candidate as the old table has some 700 million rows, most of which now is unneeded data.



We use partition switching to keep data for a variety of time periods - from 3 days to 15 months, depending upon what we need it for.
You'd be better off having a partition scheme based on age, not the actual date, because you'll have to alter it as time progresses.
Have a look at an example of a three day retention setup,which can be changed to suit the retention you require - http://wp.me/p3Vxvi-6S

Changing a table with 700 million rows to cater for partitioning is going to be interesting.

Edited for typo.


Does that scheme not mean that data is actually required to be moved between partitions, rather than just the partitions themselves remaining static and then being binned off when that data counter hits a specific age?

The old table is not going to be migrated into the new table, this is essentially a new system that will be starting with no data and building up from scratch; however I just want to make sure that the decision already made aren't going to mean that this is a problem going forwards once the table has accumulated that sort of volume of data.

Follow me on twitter @EvoDBACheck out my blog Natural Selection DBA
Steve Hall
Steve Hall
SSCoach
SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)

Group: General Forum Members
Points: 17004 Visits: 13242
Matthew Darwin (4/1/2015)

Does that scheme not mean that data is actually required to be moved between partitions, rather than just the partitions themselves remaining static and then being binned off when that data counter hits a specific age?

The old table is not going to be migrated into the new table, this is essentially a new system that will be starting with no data and building up from scratch; however I just want to make sure that the decision already made aren't going to mean that this is a problem going forwards once the table has accumulated that sort of volume of data.



No, the data isn't physically moved - its a change to the metadata, so takes an extremely short time to execute. You shouldn't notice the difference between switching and truncating two rows or two million rows.

Steve Hall
Linkedin
Blog Site
Matthew Darwin
Matthew Darwin
SSCarpal Tunnel
SSCarpal Tunnel (4.9K reputation)SSCarpal Tunnel (4.9K reputation)SSCarpal Tunnel (4.9K reputation)SSCarpal Tunnel (4.9K reputation)SSCarpal Tunnel (4.9K reputation)SSCarpal Tunnel (4.9K reputation)SSCarpal Tunnel (4.9K reputation)SSCarpal Tunnel (4.9K reputation)

Group: General Forum Members
Points: 4935 Visits: 880
No I still don't understand; unless I've completely misunderstood, in the scenario you provide the value on the row changes, meaning that value now should be in a different partition from when it was inserted, i.e. it's now two days old rather than one. The data is effectively updated at that point (the partition value 1 needs to be rewritten as value 2, the page has to come off disk into buffer be changed, and put back to disk), which would mean it has to be rewritten to the new partition?

I (...think I...) understand the metadata change for partition switching, but in the scenario above the actual data row itself changes to move it from the first to the second partition? If the data was static and new partition ranges were added, then the underlying data would never have to move between partitions.

More so, in my scenario where I have a wider window, only some of the data would move between the partitions; (e.g. with chunks of data of three months, at month change, only the oldest month would be moved), so there could even be data at a page level that would need to exist within different partitions?

Follow me on twitter @EvoDBACheck out my blog Natural Selection DBA
Steve Hall
Steve Hall
SSCoach
SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)

Group: General Forum Members
Points: 17004 Visits: 13242
It's OK - it took a while for me to understand it too - hence the blog article. I had to write it all out to understand it and see what it did.

I think the mistake is thinking that the PartitionNumber column is actually the number of days old that the row actually is. It isn't that at all - it's the result of the calculation with the modulo of 3. Data written today (01 Apr 2015) will have a value of 2 - it isn't an indication of the age of the row, just which partition it wants to store it in.
Tomorrow's data will want to be in PartitionNumber 3 - try it:
SELECT abs(datediff(day,CONVERT([date],'20000101',(112)),'20150401')%(3)+(1))


SELECT abs(datediff(day,CONVERT([date],'20000101',(112)),'20150402')%(3)+(1))


So the purging process uses the same calculation, works out that PartitionNumber 3 is required tomorrow and so switches the data out of partition 3 and truncates it, leaving it empty for the following day.
Because there are three partitions it cycles through them each day, always clearing the one that it needs to use the next day (if the job is scheduled correctly).
If you wanted a monthly partition scheme then for each month the calculation would calculate a partition number for the row created at that time (based on 'month' instead of 'day'). It wouldn't actually be the age (in months) of that partition, just the result of the calculation. As long as it is consistent for the month and the purge process uses the same calculation then it will have a monthly partition and purging system.

Steve Hall
Linkedin
Blog Site
Matthew Darwin
Matthew Darwin
SSCarpal Tunnel
SSCarpal Tunnel (4.9K reputation)SSCarpal Tunnel (4.9K reputation)SSCarpal Tunnel (4.9K reputation)SSCarpal Tunnel (4.9K reputation)SSCarpal Tunnel (4.9K reputation)SSCarpal Tunnel (4.9K reputation)SSCarpal Tunnel (4.9K reputation)SSCarpal Tunnel (4.9K reputation)

Group: General Forum Members
Points: 4935 Visits: 880
Aha, light bulb moment! I knew I was missing something there, as much as anything from skim reading rather than properly looking at it.

So, if we were splitting into months it would look like this with 13 partitions, twelve that will hold data that we are currently processing, and a thirteenth for the one we will be purging:-


--show rolling partitions

DECLARE @Tables TABLE
(MonthDate datetime2(0) NOT NULL)

INSERT INTO @Tables
SELECT TOP 13 DATEADD(MONTH, ROW_NUMBER() OVER (ORDER BY object_id) -1, '2015-01-01') AS MonthDate
FROM sys.columns AS c

SELECT MonthDate
, ABS(DATEDIFF(MONTH,0,MonthDate)%13) + 1
FROM @Tables

--code to define the partition on the table

SELECT DATEADD(MONTH, DATEDIFF(MONTH, 0, GETDATE()), 0)



Or the same for quarters but using modulo 5 and quarter in the datefunctions instead of month.

I guess the only risk is that if for some reason the job that runs the partition switch fails/doesn't run for some reason, then you would end up with data going into a partition that was already populated with old data?

Follow me on twitter @EvoDBACheck out my blog Natural Selection DBA
ScottPletcher
ScottPletcher
SSC Guru
SSC Guru (85K reputation)SSC Guru (85K reputation)SSC Guru (85K reputation)SSC Guru (85K reputation)SSC Guru (85K reputation)SSC Guru (85K reputation)SSC Guru (85K reputation)SSC Guru (85K reputation)

Group: General Forum Members
Points: 85663 Visits: 9639
Cluster the original table on ( CreatedDate, SmsId ), in that order, not vice-versa.

Once you get the best clustering key, you're likely not to need partitioning yet. And, if and when you do, the clustering key will already match your partitioning key.

SQL DBA,SQL Server MVP(07, 08, 09) Prosecutor James Blackburn, in closing argument in the Fatal Vision murders trial:
If in the future, you should cry a tear, cry one for them [the murder victims]. If in the future, you should say a prayer, say one for them. And if in the future, you should light a candle, light one for them.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum








































































































































































SQLServerCentral


Search