Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Partitioning - Part 2

By Andy Warren,

In Part 1 we did an overview of partitioning and related techniques such as archiving. This time we'll dig deeper into archiving which is perhaps the easiest technique to use regardless of the platform.

The common archiving strategy is to do so by a date column. To illustrate a simple archiving strategy we'll use the Person.Contact table in Adventureworks. Start by creating an archive table with the same structure as Person.Contact and I like to add two additional columns; ArchiveID which will be an identity and ArchivedDate which will be a datetime set to a default of GetUTCDate() when the row is added to the table. You can use the technique of your choice for building this table but I like the simplicity of select into because I can get a plain table with no indexes.

select * into person.contactarchive from person.contact where 1=0

I then use the SSMS designer to add ArchiveID and DateAdded, and I make ArchiveID the primary key. This is the final result:

CREATE TABLE [Person].[contactarchive](
	[ContactID] [int] NOT NULL,
	[NameStyle] [dbo].[NameStyle] NOT NULL,
	[Title] [nvarchar](8) NULL,
	[FirstName] [dbo].[Name] NOT NULL,
	[MiddleName] [dbo].[Name] NULL,
	[LastName] [dbo].[Name] NOT NULL,
	[Suffix] [nvarchar](10) NULL,
	[EmailAddress] [nvarchar](50) NULL,
	[EmailPromotion] [int] NOT NULL,
	[Phone] [dbo].[Phone] NULL,
	[PasswordHash] [varchar](128) NOT NULL,
	[PasswordSalt] [varchar](10) NOT NULL,
	[rowguid] [uniqueidentifier] NOT NULL,
	[ModifiedDate] [datetime] NOT NULL,
	[ArchiveID] [int] IDENTITY(1,1) NOT NULL,
	[DateAdded] [smalldatetime] NOT NULL CONSTRAINT [DF_contactarchive_DateAdded]  DEFAULT (getutcdate()),
 CONSTRAINT [PK_contactarchive] PRIMARY KEY CLUSTERED 
(
	[ArchiveID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

A few points worth mentioning before we continue. One is that it's worthwhile to be consistent in your naming convention. I like adding the 'archive' postfix so that I can tell at a glance if a table has an archive relative in the database. I also like using archive rather than history as history is often used for different purposes - auditing being the most common, where you use triggers to capture some or all changes to a table. I've also elected to lightly index, which makes sense if you won't be accessing the table at all, but in many cases you may be hitting the archive table frequently and in that case you should index to the same degree or more than the base table, since the only time the indexes will be affected will be during the archive operation.

Another point is that if you follow my example and use an identity column on the archive table you have to be sure that all upstream code is using Scope_Identity() rather than @@Identity or bad things will happen as the incorrect ID is returned to the caller.  There are alternatives to avoid that possibility. One is to just use a uniqueidentifier, the other is to just continue to use ContactID as the primary key. The latter works fine if you know you'll be archiving each record once and only once (which is the scenario we are implementing next and is most common).

Now let's build the archiving code and for this example we'll use ModifiedDate as our key column.

alter proc usp_PersonContactArchive @ThruDate datetime
as

set nocount on
set xact_abort on

begin transaction
--first we stabilize the set of records we want to move
create table #temp (ContactID int)
create unique index ndxMain on #Temp (ContactID)

insert into #temp (
	ContactID)
select
	ContactId
from Person.Contact
where
	ModifiedDate < = @ThruDate

--put the records in the archive table
INSERT Person.contactarchive
	(ContactID
	,NameStyle
	,Title
	,FirstName
	,MiddleName
	,LastName
	,Suffix
	,EmailAddress
	,EmailPromotion
	,Phone
	,PasswordHash
	,PasswordSalt
	,rowguid
	,ModifiedDate
	)
select
   C.ContactID
   ,NameStyle
   ,Title
   ,FirstName
   ,MiddleName
   ,LastName
   ,Suffix
   ,EmailAddress
   ,EmailPromotion
   ,Phone
   ,PasswordHash
   ,PasswordSalt
   ,rowguid
   ,ModifiedDate
from Person.Contact C inner join #temp T on C.ContactId = T.ContactID

--delete from source
delete
from Person.Contact where
ContactID in (Select ContactID from #temp)

--clean up
drop table #temp

commit transaction

Pretty vanilla code, the goal is to make sure that we don't delete records from the source table without them being in the archive table. An alternate, and perhaps slightly more robust, implementation would be to join back to the archive table to positively identify records to delete, like this:

create proc usp_PersonContactArchive2 @ThruDate datetime

as

set nocount on
set xact_abort on

begin transaction

--put the records in the archive table
INSERT Person.contactarchive
	(ContactID
	,NameStyle
	,Title
	,FirstName
	,MiddleName
	,LastName
	,Suffix
	,EmailAddress
	,EmailPromotion
	,Phone
	,PasswordHash
	,PasswordSalt
	,rowguid
	,ModifiedDate
	)
select
   C.ContactID
   ,NameStyle
   ,Title
   ,FirstName
   ,MiddleName
   ,LastName
   ,Suffix
   ,EmailAddress
   ,EmailPromotion
   ,Phone
   ,PasswordHash
   ,PasswordSalt
   ,rowguid
   ,ModifiedDate
from Person.Contact C
where
	ModifiedDate < = @ThruDate

--delete from source
delete from person.contact
where
	ContactId in (select c.contactid from person.contact c inner join person.contactarchive ca on c.contactid = ca.contactid)

commit transaction

All that remains is to test it and to set up a job to run it. The nice thing about this implementation is that if the job fails, the next run will catch all the records the previous run should have with no extra tracking needed. The job just needs a single step, though in practice I put all my archiving code into a single job with a separate step for each table. Our calling syntax is simple:

declare @ThruDate datetime
set @ThruDate = DateAdd(yy, -9, getutcdate())
exec usp_PersonContactArchive @ThruDate

Because we're using Adventureworks and their is no new data being added, I've elected to archive anything more than a nice odd 9 years old each time it runs. We can run it at whatever internal makes sense. It's less work to do it daily so that we move smaller chunks of data, but we might have a business situation where we want to archive once a year on Jan 1.

This is simple archiving and ignores two possibilities. One is that you'll have foreign key constraints and the other is that the number of records to be archived is so large that doing it in a single transaction is prohibitive. Foreign keys can range from mild pain to severe. Let's look at a not quite so simple situation where we our target table is part of a foreign key relationship. I've created a table called TestFKey and populated it with data from Person.Contact, and then added the foreign key constraint to person.contact.

CREATE TABLE [dbo].[TestFkey](
	[contactid] [int] IDENTITY(1,1) NOT NULL,
	[modifieddate] [datetime] NOT NULL,
 CONSTRAINT [PK_TestFkey] PRIMARY KEY CLUSTERED 
(
	[contactid] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
ALTER TABLE [dbo].[TestFkey]  WITH CHECK ADD  CONSTRAINT [FK_TestFkey_Contact] FOREIGN KEY([contactid])
REFERENCES [Person].[Contact] ([ContactID])

To test, I ran this statement:

exec usp_PersonContactArchive '12/30/1998'

And this is the resulting error:

A check of our archive table shows that we're still transactionally consistent. When the delete failed our transaction failed along with it. This will happen if even one foreign key relationship is still valid within our target set. The only way to resolve is to either only move records without a foreign key relationship (not incredibly useful), or we have to archive our way from the bottom up, starting with records in the TestFkey table. With a good number of foreign keys and add in some business rules that are any more complex than date based and it gets pretty complex. To resolve this one we would just mirror our archiving process pointed at the TestFkey table, put that step ahead of our Person.Contact step and let it run.

Let's revisit our other potential issue which is that we may have too many records to process in one pass. Only experience can tell you what that number is, but it's easy enough to implement. First we need to modify our archiving proc a bit (changes are highlighted):

alter proc usp_PersonContactArchive2 @ThruDate datetime, @ArchiveCount int output

as

set nocount on
set xact_abort on

begin transaction

--put the records in the archive table
INSERT Person.contactarchive
	(ContactID
	,NameStyle
	,Title
	,FirstName
	,MiddleName
	,LastName
	,Suffix
	,EmailAddress
	,EmailPromotion
	,Phone
	,PasswordHash
	,PasswordSalt
	,rowguid
	,ModifiedDate
	)
select top 1
   C.ContactID
   ,NameStyle
   ,Title
   ,FirstName
   ,MiddleName
   ,LastName
   ,Suffix
   ,EmailAddress
   ,EmailPromotion
   ,Phone
   ,PasswordHash
   ,PasswordSalt
   ,rowguid
   ,ModifiedDate
from Person.Contact C
where
	ModifiedDate < = @ThruDate
order by 
	ModifiedDate

set @ArchiveCount = @@rowcount

--delete from source
delete from person.contact
where
	ContactId in (select c.contactid from person.contact c inner join person.contactarchive ca on c.contactid = ca.contactid)

commit transaction

Then we add a higher level calling proc that will call our original proc multiple times:

alter proc usp_PersonContactArchiveLoop as

set nocount on

declare @ThruDate datetime
declare @RecCount int
declare @MaxLoop int

set @ThruDate = DateAdd(yy, -8, getutcdate())
set @maxLoop = 10

while @MaxLoop > 0
	begin
		exec usp_PersonContactArchive2 '1/1/1999', @RecCount output
		print 'Archived ' + convert(varchar(10), @RecCount)
		set @MaxLoop = @MaxLoop - 1
		if @RecCount = 0
			break
	end		

I've designed this one to loop up to 10 times and to archive one record on each pass. In practice we might want to let it run until completion, or have it break once a certain time is reached so that it doesn't continue to run into production hours.

Hopefully that will get you started thinking about how you might go about archiving in your current situation. Archiving can lead to some performance gains and it can delay having to purchase more expensive disk space if you move the archive tables to cheaper storage. The downside is that the data is not quite as easily available as it used to be. Next time we'll dig into partitioned views!

Total article views: 8773 | Views in the last 30 days: 27
 
Related Articles
FORUM

datetime stamping for ModifiedDate and CreatedDate

how to automatically update the ModifiedDate column?

FORUM

Data Archiving

Data Archiving

FORUM

Archiving Performance... Need help

Archiving

FORUM

data archiving

how to do data archiving

FORUM

data archiving

how to do data archiving

Tags
miscellaneous    
sql server 2005    
strategies    
t-sql    
 
Contribute

Join the most active online SQL Server Community

SQL knowledge, delivered daily, free:

Email address:  

You make SSC a better place

As a member of SQLServerCentral, you get free access to loads of fresh content: thousands of articles and SQL scripts, a library of free eBooks, a weekly database news roundup, a great Q & A platform… And it’s our huge, buzzing community of SQL Server Professionals that makes it such a success.

Join us!

Steve Jones
Editor, SQLServerCentral.com

Already a member? Jump in:

Email address:   Password:   Remember me: Forgotten your password?
Steve Jones