Introduction
There has been some
talk about the use of partitioned views in SQL Server 2000 to gain performance
improvements and for scale-out of applications.
I am writing this
article to put forward my findings in respect of the speed of partitioned
views. I hope it will generate more
discussion and will give some an insight as to where partitioned views can be
used.
What is it about?
There is a Whitepaper
on the Microsoft website entitled “Microsoft SQL Server MegaServers: Achieving
Software Scale-Out” that describes the throughput of transactions for
various configurations in use. It is
worth reading.
Also note that there
is a difference between a partitioned view (data is all on one server) and a
distributed partitioned view (where the data is divided between SQL Servers
that are ‘Federated’).
This gives the
impression that using partitioned views will give performance improvements.
For this article, I am
going to restrict the discussion to partitioned views with all data on one SQL
Server (Local Partitioned View).
Before going further I
will point out another excellent piece of information. In the book Microsoft SQL Server 2000
Resource Kit there is a section on Data Partitioning Components that
states:
·
If you are attempting to
alleviate disk contention, the best approach is to solve this as a standard
disk I/O problem. Partitioning data on the same server does not provide
noticeable performance gains, and most likely you would be better served to use
a single table with good indexes that contain all the
data.
This appears to be a
contradiction. How can partitioned views
achieve performance improvements but a single table with good indexes works
better than the view?
What I have found is
there is a trade off and some gotchas to using a
partitioned view.
We looked at
partitioned views as a possible solution to archiving old data (with simple
recovery method in case we suddenly need it back) while providing a path to
Federated SQL Servers in future.
Measuring the speed
for common queries.
First up, take a look
at the table schema:
CREATE TABLE [dbo].[tblCRD] (
[int_detID] [int] NOT
NULL ,
[tint_RecType] [tinyint]
NOT NULL ,
[tint_IsCh] [tinyint]
NOT NULL ,
[int_OLID] [int]
NOT NULL ,
[int_DLID] [int]
NOT NULL ,
[int_OrPrefID] [int]
NOT NULL ,
[int_DesPrefID] [int]
NOT NULL ,
[sint_OrCtyID] [smallint]
NOT NULL ,
[sint_DesCtyID] [smallint]
NOT NULL ,
[tint_OrRegID] [tinyint]
NOT NULL ,
[tint_DesRegID] [tinyint]
NOT NULL ,
[sint_IncRteID] [smallint]
NOT NULL ,
[sint_OutRteID] [smallint]
NOT NULL ,
[dt_GMTSt] [datetime]
NOT NULL ,
[dt_LocalSt] [datetime]
NULL ,
[int_Dr] [int]
NOT NULL ,
[int_Dist] [int]
NOT NULL ,
[bit_IsPre] [bit] NOT NULL ,
[sint_SceCarrID] [smallint]
NULL ,
[bit_IsLoc] [bit] NULL ,
[int_ClTypeID] [int]
NULL ,
[tint_PMb] [tinyint]
NULL CONSTRAINT [DF_CRD_PMb] DEFAULT (0),
CONSTRAINT [PK_tblCRD]
PRIMARY KEY CLUSTERED
(
[int_detID]
) WITH FILLFACTOR = 90 ON [PRIMARY] , --Filegroup
CONSTRAINT [CK_tblCRD_tint_IsCh]
CHECK ([tint_IsCh] = 1 or [tint_IsCh]
= 0)
) ON [PRIMARY] --Filegroup
GO
CREATE INDEX [IX_tblCRD_dt_GMTSt]
ON [dbo].[tblCRD]([dt_GMTSt])
WITH
FILLFACTOR = 98 ON [INDEX] --Filegroup
GO
CREATE INDEX [IX_tblCRD_sint_SceCarrID]
ON [dbo].[tblCRD]([sint_SceCarrID])
WITH
FILLFACTOR = 98 ON [INDEX] --Filegroup
GO
CREATE INDEX [IX_tblCRD_dt_LocalSt]
ON [dbo].[tblCRD]([dt_LocalSt])
WITH
FILLFACTOR = 98 ON [INDEX] --Filegroup
GO
Now this table has
approximately 975 million rows and takes up about 120 GB in our production
environment.
I have set up a test
database for the member tables and partitioned view so I can get comparative
performance.
The partitioned view
member tables have been designed to take about 32 million rows each. The int_detID field is the partitioning column and has a
check constraint for each member table.
Note: All queries for
inserts, updates or deletes takes place within Begin Trans/Rollback Trans after
hours and after backups complete.
An
example of the queries run
against the table and view for comparison are as follows:
--
Limit transactions to sizable chunk
SET
Rowcount 35000
--
get rid of existing global temp table
if
exists (
select * from tempdb.dbo.sysobjects where name = '##tblCRD'
)
drop table
##tblCRD
GO
--
Create Temp Table and fill with rows from original table
SELECT
[int_detID]
,[tint_RecType]
,[tint_IsCh]
,[int_OLID]
,[int_DLID]
,[int_OrPrefID]
,[int_DesPrefID]
,[sint_OrCtyID]
,[sint_DesCtyID]
,[tint_OrRegID]
,[tint_DesRegID]
,[sint_IncRteID]
,[sint_OutRteID]
,[dt_GMTSt]
,[dt_LocalSt]
,[int_Dr]
,[int_Dist]
,[bit_IsPre]
,[sint_SceCarrID]
,[bit_IsLoc]
,[int_ClTypeID]
,[tint_PMb]
INTO
##tblCRD
FROM
[dbo].[tblCRD] (nolock)
order
by int_detID Desc
--
Add Index to temp table
CREATE
UNIQUE NONCLUSTERED INDEX
NCI_tblCRD ON
##tblCRD(int_detID)
--
Showtime
Begin
Transaction
Declare
@Duration Int
,@StartTime datetime
,@FinishTime
datetime
select @StartTime = Getdate()
,@FinishTime = Getdate()
,@Duration = 0
Select
'Start Count ', Getdate()
--
Check temp table has records
SELECT
count(*)
FROM
[dbo].[tblCRD] CRD, ##tblCRD T
Where
T.int_detID = CRD.int_detID
--
Delete from Origin table
Select
'Start Delete ', Getdate()
DELETE
[dbo].[tblCRD]
FROM
[dbo].[tblCRD] CRD JOIN ##tblCRD T
ON
CRD.int_detID = T.int_detID
Where
CRD.int_detID >= (Select MIN(T1.int_detID)
from ##tblCRD T1)
AND
CRD.int_detID <= (Select MAX(T2.int_detID)
from ##tblCRD T2)
--
Do some processing
Select
'Start Update int_Dr ', Getdate()
Select
Top 10 * from
##tblCRD order by int_detID
Update
##tblCRD
Set
int_Dr = int_Dr * -1
Select
Top 10 * from
##tblCRD order by int_detID
Select
'Before Insert. ',Getdate()
--
Insert back into origin table
INSERT
INTO dbo.tblCRD (
[int_detID]
,[tint_RecType]
,[tint_IsCh]
,[int_OLID]
,[int_DLID]
,[int_OrPrefID]
,[int_DesPrefID]
,[sint_OrCtyID]
,[sint_DesCtyID]
,[tint_OrRegID]
,[tint_DesRegID]
,[sint_IncRteID]
,[sint_OutRteID]
,[dt_GMTSt]
,[dt_LocalSt]
,[int_Dr]
,[int_Dist]
,[bit_IsPre]
,[sint_SceCarrID]
,[bit_IsLoc]
,[int_ClTypeID]
,[tint_PMb]
)
select
[int_detID]
,[tint_RecType]
,[tint_IsCh]
,[int_OLID]
,[int_DLID]
,[int_OrPrefID]
,[int_DesPrefID]
,[sint_OrCtyID]
,[sint_DesCtyID]
,[tint_OrRegID]
,[tint_DesRegID]
,[sint_IncRteID]
,[sint_OutRteID]
,[dt_GMTSt]
,[dt_LocalSt]
,[int_Dr]
,[int_Dist]
,[bit_IsPre]
,[sint_SceCarrID]
,[bit_IsLoc]
,[int_ClTypeID]
,[tint_PMb]
from ##tblCRD
select @FinishTime = Getdate()
,@Duration =
DATEDIFF(ss, @StartTime, @FinishTime)
select @FinishTime
Select @Duration
ROLLBACK
TRANSACTION
NOTE: This is not a
complete list of transactions used for tests.
These are an indication only.
The results.
The following table
shows the results comparing exactly the same queries:
|
|
Original Table
|
Partitioned View
|
Comment
|
|
Insert
|
13 Seconds
|
18 Seconds
|
|
|
Update *
|
8 Seconds/7 Sec
|
4 Seconds/63 Seconds
|
With and Without
Where clause
|
|
Delete **
|
93 Seconds/92 Sec
|
4 Seconds / 18
minutes
|
With and Without
Where clause
|
|
Select ***
|
15 Seconds/ 5 Sec
|
2 Seconds/3 min 15
Sec
|
With and Without
Hint
|
Note: The Update query
(following) was run with and without the following where clause. This where clause actually changes the plan
chosen by the query optimiser by forcing it to figure out which member tables
are needed before running the update.
Update
tblCRD
Set
tblCRD.int_Dr = tblCRD. int_Dr * -1
From
tblCRD inner join ##tblCRD
On
tblCRD.int_detID = ##tblCRD.int_detID
Where CRD.int_detID >= (Select
MIN(T1.int_detID) from ##tblCRD
T1)
AND
CRD.int_detID <= (Select MAX(T2.int_detID)
from ##tblCRD T2)
** Note: The Delete query used here was using a trick
that favours the Partitioned View. If
the Delete statement was:
DELETE
[dbo].[tblCRD]
FROM
[dbo].[tblCRD] CRD JOIN ##tblCRD T
ON
CRD.int_detID = T.int_detID
The time for the
Partitioned View blows out to 18 minutes!!!!
The time for the original table is 92 seconds.
*** Note: Firstly I wanted to prevent re-use of query
plans so the queries were varied.
Compare only between Original table and Partitioned View.
The 15 second result
for Table select versus 2 second result for the view was done using a nolock and index hint on the table select statement and
just a nolock hint on the view select.
The other result shows
the speed of a different select statement that did not use any hints. Notice that the partitioned view did not
perform very well compared to the table.
Conclusion.
Although partitioned
views may appear to provide a performance boost, the use of partitioned views
should be approached carefully. It is
not something that should be implemented without thorough investigation and testing.
The downside
to partitioned views are numerous.
These include:
Can’t use Identity fields.
Can’t use index hints.
Probably need to rewrite existing queries to re-optimise.
To update partitioned views you need to have SQL
Server 2000 Enterprise Edition
(or developer edition.)
All criteria to set up a partitioned view described in BOL must be
followed implicitly.
There
is a gotcha with Check Constraints on partitioning column: These must be added
with Check Option. If disabled/re-enabled then view cannot be updated (refer to
KB article Q270013 )
Can’t
use text, ntext, image or timestamp fields.
Some queries will
run much slower.
The upside to
partitioned views can make it worthwhile.
These include:
The queries that can restrict what member tables get used by the
optimiser can run much faster.
The use of partitioned views allows data to be kept
in separate databases that can be backed up separately. This is a lot simpler than using
filegroups.
Keeping member tables in a few separate
databases makes it easy to migrate to Federated Servers later. Distributed Partitioned Views allow massive
scale out!
Archiving and purging old data can be made very
easy. Backup database
for archiving then truncate table with old data.
Data access to member tables can be balanced
across different disk arrays just like filegroups.
Neil Jacobson
Nbj_jobs@hotmail.com
Sydney Australia.