There has been some talk about the use of partitioned views in SQL Server 2000 to gain performance improvements and for scale-out of applications.
I am writing this article to put forward my findings in respect of the speed of partitioned views. I hope it will generate more discussion and will give some an insight as to where partitioned views can be used.
What is it about?
There is a Whitepaper on the Microsoft website entitled "Microsoft SQL Server MegaServers: Achieving Software Scale-Out" that describes the throughput of transactions for various configurations in use. It is worth reading.
Also note that there is a difference between a partitioned view (data is all on one server) and a distributed partitioned view (where the data is divided between SQL Servers that are ‘Federated’).
This gives the impression that using partitioned views will give performance improvements.
For this article, I am going to restrict the discussion to partitioned views with all data on one SQL Server (Local Partitioned View).
Before going further I will point out another excellent piece of information. In the book Microsoft SQL Server 2000 Resource Kit there is a section on Data Partitioning Components that states:
If you are attempting to alleviate disk contention, the best approach is to solve this as a standard disk I/O problem. Partitioning data on the same server does not provide noticeable performance gains, and most likely you would be better served to use a single table with good indexes that contain all the data.
This appears to be a contradiction. How can partitioned views achieve performance improvements but a single table with good indexes works better than the view?
What I have found is there is a trade off and some gotchas to using a partitioned view.
We looked at partitioned views as a possible solution to archiving old data (with simple recovery method in case we suddenly need it back) while providing a path to Federated SQL Servers in future.
Measuring the speed for common queries.
First up, take a look at the table schema:
CREATE TABLE [dbo].[tblCRD] ( [int_detID] [int] NOT NULL , [tint_RecType] [tinyint] NOT NULL , [tint_IsCh] [tinyint] NOT NULL , [int_OLID] [int] NOT NULL , [int_DLID] [int] NOT NULL , [int_OrPrefID] [int] NOT NULL , [int_DesPrefID] [int] NOT NULL , [sint_OrCtyID] [smallint] NOT NULL , [sint_DesCtyID] [smallint] NOT NULL , [tint_OrRegID] [tinyint] NOT NULL , [tint_DesRegID] [tinyint] NOT NULL , [sint_IncRteID] [smallint] NOT NULL , [sint_OutRteID] [smallint] NOT NULL , [dt_GMTSt] [datetime] NOT NULL , [dt_LocalSt] [datetime] NULL , [int_Dr] [int] NOT NULL , [int_Dist] [int] NOT NULL , [bit_IsPre] [bit] NOT NULL , [sint_SceCarrID] [smallint] NULL , [bit_IsLoc] [bit] NULL , [int_ClTypeID] [int] NULL , [tint_PMb] [tinyint] NULL CONSTRAINT [DF_CRD_PMb] DEFAULT (0), CONSTRAINT [PK_tblCRD] PRIMARY KEY CLUSTERED ( [int_detID] ) WITH FILLFACTOR = 90 ON [PRIMARY] , --Filegroup CONSTRAINT [CK_tblCRD_tint_IsCh] CHECK ([tint_IsCh] = 1 or [tint_IsCh] = 0) ) ON [PRIMARY] --Filegroup GO CREATE INDEX [IX_tblCRD_dt_GMTSt] ON [dbo].[tblCRD]([dt_GMTSt]) WITH FILLFACTOR = 98 ON [INDEX] --Filegroup GO CREATE INDEX [IX_tblCRD_sint_SceCarrID] ON [dbo].[tblCRD]([sint_SceCarrID]) WITH FILLFACTOR = 98 ON [INDEX] --Filegroup GO CREATE INDEX [IX_tblCRD_dt_LocalSt] ON [dbo].[tblCRD]([dt_LocalSt]) WITH FILLFACTOR = 98 ON [INDEX] --Filegroup GO
Now this table has approximately 975 million rows and takes up about 120 GB in our production environment.
I have set up a test database for the member tables and partitioned view so I can get comparative performance.
The partitioned view member tables have been designed to take about 32 million rows each. The
int_detID field is the partitioning column and has a check constraint for each member table.
Note: All queries for inserts, updates or deletes takes place within Begin Trans/Rollback Trans after hours and after backups complete.
An example of the queries run against the table and view for comparison are as follows:
-- Limit transactions to sizable chunk SET Rowcount 35000 -- get rid of existing global temp table if exists ( select * from tempdb.dbo.sysobjects where name = '##tblCRD' ) drop table ##tblCRD GO -- Create Temp Table and fill with rows from original table SELECT [int_detID] ,[tint_RecType] ,[tint_IsCh] ,[int_OLID] ,[int_DLID] ,[int_OrPrefID] ,[int_DesPrefID] ,[sint_OrCtyID] ,[sint_DesCtyID] ,[tint_OrRegID] ,[tint_DesRegID] ,[sint_IncRteID] ,[sint_OutRteID] ,[dt_GMTSt] ,[dt_LocalSt] ,[int_Dr] ,[int_Dist] ,[bit_IsPre] ,[sint_SceCarrID] ,[bit_IsLoc] ,[int_ClTypeID] ,[tint_PMb] INTO ##tblCRD FROM [dbo].[tblCRD] (nolock) order by int_detID Desc -- Add Index to temp table CREATE UNIQUE NONCLUSTERED INDEX NCI_tblCRD ON ##tblCRD(int_detID) -- Showtime Begin Transaction Declare @Duration Int ,@StartTime datetime ,@FinishTime datetime select @StartTime = Getdate() ,@FinishTime = Getdate() ,@Duration = 0 Select 'Start Count ', Getdate() -- Check temp table has records SELECT count(*) FROM [dbo].[tblCRD] CRD, ##tblCRD T Where T.int_detID = CRD.int_detID -- Delete from Origin table Select 'Start Delete ', Getdate() DELETE [dbo].[tblCRD] FROM [dbo].[tblCRD] CRD JOIN ##tblCRD T ON CRD.int_detID = T.int_detID Where CRD.int_detID >= (Select MIN(T1.int_detID) from ##tblCRD T1) AND CRD.int_detID <= (Select MAX(T2.int_detID) from ##tblCRD T2) -- Do some processing Select 'Start Update int_Dr ', Getdate() Select Top 10 * from ##tblCRD order by int_detID Update ##tblCRD Set int_Dr = int_Dr * -1 Select Top 10 * from ##tblCRD order by int_detID Select 'Before Insert. ',Getdate() -- Insert back into origin table INSERT INTO dbo.tblCRD ( [int_detID] ,[tint_RecType] ,[tint_IsCh] ,[int_OLID] ,[int_DLID] ,[int_OrPrefID] ,[int_DesPrefID] ,[sint_OrCtyID] ,[sint_DesCtyID] ,[tint_OrRegID] ,[tint_DesRegID] ,[sint_IncRteID] ,[sint_OutRteID] ,[dt_GMTSt] ,[dt_LocalSt] ,[int_Dr] ,[int_Dist] ,[bit_IsPre] ,[sint_SceCarrID] ,[bit_IsLoc] ,[int_ClTypeID] ,[tint_PMb] ) select [int_detID] ,[tint_RecType] ,[tint_IsCh] ,[int_OLID] ,[int_DLID] ,[int_OrPrefID] ,[int_DesPrefID] ,[sint_OrCtyID] ,[sint_DesCtyID] ,[tint_OrRegID] ,[tint_DesRegID] ,[sint_IncRteID] ,[sint_OutRteID] ,[dt_GMTSt] ,[dt_LocalSt] ,[int_Dr] ,[int_Dist] ,[bit_IsPre] ,[sint_SceCarrID] ,[bit_IsLoc] ,[int_ClTypeID] ,[tint_PMb] from ##tblCRD select @FinishTime = Getdate() ,@Duration = DATEDIFF(ss, @StartTime, @FinishTime) select @FinishTime Select @Duration ROLLBACK TRANSACTION
NOTE: This is not a complete list of transactions used for tests. These are an indication only.
The following table shows the results comparing exactly the same queries:
8 Seconds/7 Sec
4 Seconds/63 Seconds
With and Without Where clause
93 Seconds/92 Sec
4 Seconds / 18 minutes
With and Without Where clause
15 Seconds/ 5 Sec
2 Seconds/3 min 15 Sec
With and Without Hint
Note: The Update query (following) was run with and without the following where clause. This where clause actually changes the plan chosen by the query optimiser by forcing it to figure out which member tables are needed before running the update.
Update tblCRD Set tblCRD.int_Dr = tblCRD. int_Dr * -1 From tblCRD inner join ##tblCRD On tblCRD.int_detID = ##tblCRD.int_detID Where CRD.int_detID >= (Select MIN(T1.int_detID) from ##tblCRD T1) AND CRD.int_detID <= (Select MAX(T2.int_detID) from ##tblCRD T2)
** Note: The Delete query used here was using a trick that favours the Partitioned View. If the Delete statement was:
DELETE [dbo].[tblCRD] FROM [dbo].[tblCRD] CRD JOIN ##tblCRD T ON CRD.int_detID = T.int_detID
The time for the Partitioned View blows out to 18 minutes! The time for the original table is 92 seconds.
*** Note: Firstly I wanted to prevent re-use of query plans so the queries were varied. Compare only between Original table and Partitioned View.
The 15 second result for Table select versus 2 second result for the view was done using a nolock and index hint on the table select statement and just a nolock hint on the view select.
The other result shows the speed of a different select statement that did not use any hints. Notice that the partitioned view did not perform very well compared to the table.
Although partitioned views may appear to provide a performance boost, the use of partitioned views should be approached carefully. It is not something that should be implemented without thorough investigation and testing.
The downside to partitioned views are numerous. These include:
Can’t use Identity fields.
Can’t use index hints.
Probably need to rewrite existing queries to re-optimise.
To update partitioned views you need to have SQL Server 2000 Enterprise Edition
(or developer edition.)
All criteria to set up a partitioned view described in BOL must be followed implicitly.
There is a gotcha with Check Constraints on partitioning column: These must be added with Check Option. If disabled/re-enabled then view cannot be updated (refer to KB article Q270013 )
Can’t use text, ntext, image or timestamp fields.
Some queries will run much slower.
The upside to partitioned views can make it worthwhile. These include:
The queries that can restrict what member tables get used by the optimiser can run much faster.
The use of partitioned views allows data to be kept in separate databases that can be backed up separately. This is a lot simpler than using filegroups.
Keeping member tables in a few separate databases makes it easy to migrate to Federated Servers later. Distributed Partitioned Views allow massive scale out!
Archiving and purging old data can be made very easy. Backup database for archiving then truncate table with old data.
Data access to member tables can be balanced across different disk arrays just like filegroups.