SQLServerCentral Article

Solving the “Running Totals” Problem in SQL Server 2005 with Partitioned Tables

,

Introduction

This article is based on the article written by Jeff Moden entitled “Solving the 'Running Totals' & 'Ordinal Ranking' Problems in SS 2k/2k5." The purpose of this exercise was to find viable alternatives to cursors, while loops, and triangular joins to solve this problem when using partitioned tables.

If you have not read Jeff's informative article, I recommend that you stop reading this article here and take several minutes to read his article and perhaps peruse the discussion that followed. His article provides all the background information and testing of computing running totals and ordinal ranking on standard tables, and I doubt that I could shed any more light on the subject then he did so well.

Disclaimers

First, I use Jeff's code from his article to generate the 1,000,000 rows of test data used in developing the procedures I will be demonstrating in this article. Second, this is not a discussion about the validity of storing running totals in a database versus computing the values as needed on request. It is to demonstrate methods that can be used for attacking this problem or others like it that may benefit from a similar approach.

Getting Started

Initial work on this project began at home using my desktop system. It is a 3-year old system with 3.0 GHz x86 P4 hyper threaded processor, 2 GB RAM, 2 SATA hard disk drives (each a master on its controller), and using SQL Server 2005 Developers Edition.

To start, I created a new database, aptly named RunningTotals. The following code was used to create the database.

USE [master]
GO
/****** Object:  Database [RunningTotals]    Script Date: 01/04/2009 20:08:45 ******//*
Create database for performing tests on Running Totals with partitioned tables.
The partitioned table will use 12 partitions over 4 filegroups (3 partitions per filegroup) with the 4 filegroups
divided across 2 disks (SATA Drives, each disk is the master disk on its controller).
There is also a default filegroup that will be used for a non-partitioned table for testing Running Totals code
as well as for holding indexed views and non-clustered indexes for all tables.
*/CREATE DATABASE [RunningTotals]
ON PRIMARY ( -- System Tables
    NAME = N'RunningTotals',
    FILENAME = N'C:\Databases\RunningTotals.mdf',
    SIZE = 129024KB,
    MAXSIZE = UNLIMITED,
    FILEGROWTH= 1024KB),
FILEGROUP [Data] DEFAULT ( -- Will be the default filegroup
    NAME = N'DataNP',
    FILENAME = N'C:\Databases\DataNP.ndf',
    SIZE = 409600KB,
    MAXSIZE = UNLIMITED,
    FILEGROWTH= 10240KB ),
FILEGROUP [Data1] ( -- 1st, 5th, and 9th partition
    NAME = N'DataP1',
    FILENAME = N'C:\Databases\DataP1.ndf',
    SIZE = 61440KB,
    MAXSIZE = UNLIMITED,
    FILEGROWTH= 10240KB ),
FILEGROUP [Data2] ( -- 2nd, 6th, and 10th partition
    NAME = N'DataP2',
    FILENAME = N'D:\Databases\DataP2.ndf',
    SIZE = 61440KB,
    MAXSIZE = UNLIMITED,
    FILEGROWTH = 10240KB ),
FILEGROUP [Data3] ( -- 3rd, 7th, and 11th partition
    NAME = N'DataP3',
    FILENAME = N'C:\Databases\DataP3.ndf',
    SIZE = 61440KB,
    MAXSIZE = UNLIMITED,
    FILEGROWTH= 10240KB ),
FILEGROUP [Data4] ( -- 4th, 8th, and 12th partition
    NAME = N'DataP4',
    FILENAME = N'D:\Databases\DataP4.ndf',
    SIZE = 61440KB,
    MAXSIZE = UNLIMITED,
    FILEGROWTH= 10240KB )
LOG ON ( -- and of course, the t-log
    NAME = N'RunningTotals_log',
    FILENAME = N'D:\Databases\RunningTotals_log.ldf',
    SIZE = 819200KB,
    MAXSIZE = 2048GB,
    FILEGROWTH= 10240KB )
GO
EXEC dbo.sp_dbcmptlevel @dbname=N'RunningTotals', @new_cmptlevel=90
GO
IF (1 = FULLTEXTSERVICEPROPERTY('IsFullTextInstalled'))
begin
EXEC[RunningTotals].[dbo].[sp_fulltext_database] @action = 'disable'
end
GO
ALTER DATABASE [RunningTotals] SETANSI_NULL_DEFAULT OFF
GO
ALTER DATABASE [RunningTotals] SETANSI_NULLS OFF
GO
ALTER DATABASE [RunningTotals] SETANSI_PADDING OFF
GO
ALTER DATABASE [RunningTotals] SETANSI_WARNINGS OFF
GO
ALTER DATABASE [RunningTotals] SETARITHABORT OFF
GO
ALTER DATABASE [RunningTotals] SETAUTO_CLOSE OFF
GO
ALTER DATABASE [RunningTotals] SETAUTO_CREATE_STATISTICS ON
GO
ALTER DATABASE [RunningTotals] SETAUTO_SHRINK OFF
GO
ALTER DATABASE [RunningTotals] SETAUTO_UPDATE_STATISTICS ON
GO
ALTER DATABASE [RunningTotals] SETCURSOR_CLOSE_ON_COMMIT OFF
GO
ALTER DATABASE [RunningTotals] SETCURSOR_DEFAULT  GLOBAL
GO
ALTER DATABASE [RunningTotals] SETCONCAT_NULL_YIELDS_NULLOFF
GO
ALTER DATABASE [RunningTotals] SETNUMERIC_ROUNDABORT OFF
GO
ALTER DATABASE [RunningTotals] SETQUOTED_IDENTIFIER OFF
GO
ALTER DATABASE [RunningTotals] SETRECURSIVE_TRIGGERS OFF
GO
ALTER DATABASE [RunningTotals] SET ENABLE_BROKER
GO
ALTER DATABASE [RunningTotals] SETAUTO_UPDATE_STATISTICS_ASYNCOFF
GO
ALTER DATABASE [RunningTotals] SETDATE_CORRELATION_OPTIMIZATIONOFF
GO
ALTER DATABASE [RunningTotals] SETTRUSTWORTHY OFF
GO
ALTER DATABASE [RunningTotals] SETALLOW_SNAPSHOT_ISOLATIONOFF
GO
ALTER DATABASE [RunningTotals] SETPARAMETERIZATION SIMPLE
GO
ALTER DATABASE [RunningTotals] SET READ_WRITE
GO
ALTER DATABASE [RunningTotals] SETRECOVERY SIMPLE
GO
ALTER DATABASE [RunningTotals] SET MULTI_USER
GO
ALTER DATABASE [RunningTotals] SETPAGE_VERIFY CHECKSUM 
GO
ALTER DATABASE [RunningTotals] SETDB_CHAINING OFF
GO

Next, I created a non-partitioned table to use for comparison purposes and to ensure that the initial code was correct.

 --  Create a non-partitioned table for initial testing of code
 --  This table will be used for comparision purposes
 CREATE TABLE [dbo].[RTTestNP](
       [RowNum]    [int] IDENTITY(1,1)NOT NULL,
       [AccountID] [int] NULL,
       [Amount]    [money] NULL,
       [Date]      [datetime] NULL,
       [RunBal]    [money] NULL,
       [GrpBal]    [money] NULL,
       [RunCnt]    [int] NULL,
       [GrpCnt]    [int] NULL,
 PRIMARY KEY NONCLUSTERED
 (
       [RowNum] ASC
 )WITH (
     PAD_INDEX  =OFF,
     STATISTICS_NORECOMPUTE  =OFF,
     IGNORE_DUP_KEY= OFF,
     ALLOW_ROW_LOCKS  =ON,
     ALLOW_PAGE_LOCKS  =ON
 ) ON [Data]
 ) ON [Data];
GO
 USE[RunningTotals]
GO
 /****** Object:  Index [IX_RTTestNP_AccountID_Date]    Script Date: 01/04/2009 20:14:28 ******/ CREATE CLUSTERED INDEX [IX_RTTestNP_AccountID_Date] ON [dbo].[RTTestNP]
 (
       [AccountID] ASC,
       [Date] ASC
 )WITH (
     PAD_INDEX  =OFF,
     STATISTICS_NORECOMPUTE  =OFF,
     SORT_IN_TEMPDB= OFF,
     IGNORE_DUP_KEY= OFF,
     DROP_EXISTING= OFF,
     ONLINE = OFF,
     ALLOW_ROW_LOCKS  =ON,
     ALLOW_PAGE_LOCKS  =ON
 ) ON [Data];
GO

Then it was time to create the partitioned table for the actual testing and development. Creating a partitioned table requires developing a Partition Function, a Partition Scheme, and then creating the Partitioned Table or Index. As I am using the same code Jeff used in his original article, we will be generating 1,000,000 rows of data with 50,000 unique Account ID's with dates between 2000-01-01 and 2009-12-31. The dates will have a time portion as well.

With this information, I decided to partition the data by year. The following is the Partition Function that I wrote for this purpose.

USE[RunningTotals]
GO
/****** Object:  PartitionFunction [runningtotalsPF]    Script Date: 01/04/2009 20:12:11 ******/CREATE PARTITION FUNCTION [runningtotalsPF](datetime)
AS RANGE RIGHT
FOR VALUES (
    N'2000-01-01T00:00:00',
    N'2001-01-01T00:00:00',
    N'2002-01-01T00:00:00',
    N'2003-01-01T00:00:00',
    N'2004-01-01T00:00:00',
    N'2005-01-01T00:00:00',
    N'2006-01-01T00:00:00',
    N'2007-01-01T00:00:00',
    N'2008-01-01T00:00:00',
    N'2009-01-01T00:00:00',
    N'2010-01-01T00:00:00'
);
GO

This Partition Function actually creates 12 partitions. Based on the data that will be generated, the first and last partitions will actually be empty. The first partition, due to the use of the RANGE RIGHT, would contain any records with a date less than 2000-01-01 and the last partition all records with a date greater than or equal to 2010-01-01.

The Partition Scheme is the next item to be defined. It defines which filegroup a partition will be assigned.

USE[RunningTotals]
GO
/****** Object:  PartitionScheme [runningtotalsPS]    Script Date: 01/04/2009 20:11:05 ******/CREATE PARTITION SCHEME [runningtotalsPS]
AS PARTITION [runningtotalsPF]
TO (
    [Data1],
    [Data2],
    [Data3],
    [Data4],
    [Data1],
    [Data2],
    [Data3],   
    [Data4],
    [Data1],
    [Data2],
    [Data3],
    [Data4]
);
GO

And finally, we can now declare the table and its clustered index with the following code.

USE[RunningTotals]
GO
/****** Object:  Table [dbo].[RTTestP]    Script Date: 01/04/2009 20:09:43 ******/SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
/*
Create the partitioned table with a nonclustered primary key.
We are not rehashing information that has been well covered in Jeff Moden's
original article on Running Totals.
*/CREATE TABLE [dbo].[RTTestP](
      [RowNum]    [int] IDENTITY(1,1)NOT NULL,
      [AccountID] [int] NULL,
      [Amount]    [money] NULL,
      [Date]      [datetime] NULL,
      [RunBal]    [money] NULL,
      [GrpBal]    [money] NULL,
      [RunCnt]    [int] NULL,
      [GrpCnt]    [int] NULL,
PRIMARY KEY NONCLUSTERED
(
      [RowNum] ASC
)WITH (
    PAD_INDEX  =OFF,
    STATISTICS_NORECOMPUTE  =OFF,
    IGNORE_DUP_KEY= OFF,
    ALLOW_ROW_LOCKS  =ON,
    ALLOW_PAGE_LOCKS  =ON
) ON [Data]
) ON [runningtotalsPS](Date);
GO
USE[RunningTotals]
GO
/****** Object:  Index [IX_RTTestP_AccountID_Date]    Script Date: 01/04/2009 20:15:38 ******//*
Create the required clustered index on the partitioned table.
*/CREATE CLUSTERED INDEX [IX_RTTestP_AccountID_Date] ON [dbo].[RTTestP]
(
      [AccountID] ASC,
      [Date] ASC
)WITH (
    PAD_INDEX  =OFF,
    STATISTICS_NORECOMPUTE  =OFF,
    SORT_IN_TEMPDB= OFF,
    IGNORE_DUP_KEY= OFF,
    DROP_EXISTING= OFF,
    ONLINE = OFF,
    ALLOW_ROW_LOCKS  =ON,
    ALLOW_PAGE_LOCKS  =ON
) ON [runningtotalsPS]([Date]);
GO

At this point, I then populated both tables with data. Although I could have figured a way to ensure both tables had identical data, I didn't really think that it would be necessary. The follow code was used to populate the two tables.

USE[RunningTotals]
GO
--===== Build the table 100 rows at a time to "mix things up"
DECLARE @Counter INT
SET @Counter = 0
WHILE @Counter < 1000000
BEGIN
    --===== Add 100 rows to the test table
    INSERT INTO dbo.RTTestNP(
        AccountID,
        Amount,
        Date)
    SELECT TOP 100
        AccountID =ABS(CHECKSUM(NEWID()))%50000+1,
        Amount    =CAST(CHECKSUM(NEWID())%10000 /100.0 ASMONEY),
        Date      = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 ASDATETIME)
    FROM
        master.dbo.columns t1
        CROSSJOIN master.dbo.columns t2
    --===== Increment the counter
    SET @Counter = @Counter + 100
END
GO--===== Build the table 100 rows at a time to "mix things up"
DECLARE @Counter INT
SET @Counter = 0
WHILE @Counter < 1000000
BEGIN
    --===== Add 100 rows to the test table
    INSERT INTO dbo.RTTestP(
        AccountID,
        Amount,
        Date)
    SELECT TOP 100
        AccountID =ABS(CHECKSUM(NEWID()))%50000+1,
        Amount    =CAST(CHECKSUM(NEWID())%10000 /100.0 ASMONEY),
        Date      = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 ASDATETIME)
    FROM
        master.dbo.columns t1
        CROSSJOIN master.dbo.columns t2
    --===== Increment the counter
    SET @Counter = @Counter + 100
END
GO

To facilitate the testing, I also created two stored procedures to “null out” the count and balance fields in the tables. The follow code will create those stored procedures.

USE[RunningTotals]
GO
/****** Object:  StoredProcedure [dbo].[ResetRTTestNP]    Script Date: 01/04/2009 20:25:02 ******/SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE PROCEDURE [dbo].[ResetRTTestNP]
as
begin
/*
To keep the RunningTotals transaction log from growing too big
during testing, reset values to null in small batches.
*/    set nocount on;
    declare @rcnt int;
    set @rcnt = 10000;
    while @rcnt <> 0
    begin
    update top (10000) dbo.RTTestNP set
        RunBal =null,
        GrpBal =null,
        RunCnt =null,
        GrpCnt =null
    from
        dbo.RTTestNP rttnp
    where
        rttnp.RunBal is not null;
    set @rcnt = @@rowcount;
    end;
    set nocount off;
end
GO
USE[RunningTotals]
GO
/****** Object:  StoredProcedure [dbo].[ResetRTTestP]    Script Date: 01/04/2009 20:25:13 ******/SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE PROCEDURE [dbo].[ResetRTTestP]
as
begin
/*
To keep the RunningTotals transaction log from growing too big
during testing, reset values to null in small batches.
*/    set nocount on;
    declare @rcnt int;
    set @rcnt = 10000;
    while @rcnt <> 0
    begin
    update top (10000) dbo.RTTestP set
        RunBal =null,
        GrpBal =null,
        RunCnt =null,
        GrpCnt =null
    from
        dbo.RTTestP rttp
    where
        rttp.RunBal is not null;
    set @rcnt = @@rowcount;
    end;
    set nocount off;
end
GO

Non-partitioned Table Recap

The first thing I wanted to accomplish at this point was to verify the code worked properly on a non-partitioned table, so I ran the following code against the table dbo.RTTestNP.

USE[RunningTotals]
GO
DBCCFREEPROCCACHE WITH NO_INFOMSGS;
DBCCFREESYSTEMCACHE ( 'ALL' )WITH MARK_IN_USE_FOR_REMOVAL, NO_INFOMSGS;
go
/*
Test the logic of the code on the non-partitioned table.  This is to verify that the
UPDATE statement is properly coded.
*/declare @StartTime datetime,
        @EndTime datetime;
set @StartTime = getdate();
set statistics io on;
set statistics time on;
declare @AccountID int,
        @RunBal money,
        @GrpBal money,
        @RunCnt int,
        @GrpCnt int;
set @AccountID = 0;
set @RunBal = 0;
set @GrpBal = 0;
set @RunCnt = 0;
set @GrpBal = 0;
update dbo.RTTestNP set
    @RunBal = RunBal = @RunBal + rttnp.Amount,
    @GrpBal = GrpBal = case when @AccountID <> rttnp.AccountID then 0 else @GrpBal end + rttnp.Amount,
    @RunCnt = RunCnt = @RunCnt + 1,
    @GrpCnt = GrpCnt = case when @AccountID <> rttnp.AccountID then 0 else @GrpCnt end + 1,
    @AccountID = rttnp.AccountID
from
    dbo.RTTestNP rttnp with(index(IX_RTTestNP_AccountID_Date));
set statistics io off;
set statistics time off;
set @EndTime = getdate();
select @StartTime, @EndTime, datediff(ms,@StartTime,@EndTime);
select * from dbo.RTTestNP orderby AccountID, Date;
go

The code ran perfectly, running in a little over 6 seconds. I will leave this as an exercise for those that want to test it themselves to verify the results.

The Partitioned Table Problem

At this time, I was then curious as to how this same code would work on the partitioned table. In the discussion that followed Jeff's original article it was stated that it would not work with partitioned tables, but there was really no indication as to the nature of the failure. With that, I simply modified the code I just ran to update the partitioned table instead. The following code is what I ran.

USE[RunningTotals]
GO
exec dbo.ResetRTTestP;
go
DBCCFREEPROCCACHE WITH NO_INFOMSGS;
DBCCFREESYSTEMCACHE ( 'ALL' )WITH MARK_IN_USE_FOR_REMOVAL, NO_INFOMSGS;
go
/*
Test the code again against the partitioned table to review the results.
*/declare @StartTime datetime,
        @EndTime datetime;
set @StartTime = getdate();
set statistics io on;
set statistics time on;
declare @AccountID int,
        @RunBal money,
        @GrpBal money,
        @RunCnt int,
        @GrpCnt int;
set @AccountID = 0;
set @RunBal = 0;
set @GrpBal = 0;
set @RunCnt = 0;
set @GrpBal = 0;
update dbo.RTTestP set
    @RunBal = RunBal = @RunBal + rttp.Amount,
    @GrpBal = GrpBal = case when @AccountID <> rttp.AccountID then 0 else @GrpBal end + rttp.Amount,
    @RunCnt = RunCnt = @RunCnt + 1,
    @GrpCnt = GrpCnt = case when @AccountID <> rttp.AccountID then 0 else @GrpCnt end + 1,
    @AccountID = rttp.AccountID
from
    dbo.RTTestP rttp with(index(IX_RTTestP_AccountID_Date));
set statistics io off;
set statistics time off;
set @EndTime = getdate();
select @StartTime, @EndTime, datediff(ms,@StartTime,@EndTime);
select * from dbo.RTTestP orderby AccountID, Date;
go

When I ran the code, it ran in a little over 8 seconds, but what was interesting was the generated output. Below is a snippet of the data generated.

RowNum  AccountID   Amount   Date                       RunBal   GrpBal RunCnt  GrpCnt
904238  1           -10.84   2000-02-06 12:29:59.250    -10.84   -10.84      1       1
282187  1           -72.53   2000-04-16 23:41:12.240    -83.37   -83.37      2       2
440025  1            84.03   2000-05-20 21:30:07.110      0.66     0.66      3       3
843884  1            61.69   2001-03-19 18:37:17.070  38771.47    61.69  99759       1
205264  1           -66.31   2001-07-25 07:41:19.743  38705.16    -4.62  99760       2
665056  1           -43.44   2001-09-27 18:27:45.227  38661.72   -48.06  99761       3
203125  1            53.95   2002-02-07 04:38:28.563  14513.98    53.95 199762       1
20432   1            76.56   2002-02-22 20:53:00.890  14590.54   130.51 199763       2
619654  1            11.58   2002-06-19 07:21:51.200  14602.12   142.09 199764       3
19674   1            87.53   2003-08-31 02:48:04.317  25137.82    87.53 299829       1
522773  1           -92.04   2003-10-29 19:51:53.930  25045.78    -4.51 299830       2
737569  1           -86.51   2005-01-11 15:24:30.497 -20333.03   -86.51 500150       1
742197  1             0.09   2005-01-21 05:36:26.150 -20332.94   -86.42 500151       2
606339  1            62.81   2005-06-04 02:53:11.297 -20270.13   -23.61 500152       3
944661  1            23.95   2005-12-11 18:30:59.413 -20246.18     0.34 500153       4
489934  1           -83.31   2006-04-14 20:14:15.060 -29339.40   -83.31 600009       1
27772   1            23.42   2006-09-01 10:53:46.417 -29315.98   -59.89 600010       2
855699  1           -29.45   2007-01-08 16:29:13.587 -37918.10   -29.45 700314       1
662833  1            38.17   2007-06-09 04:28:35.367 -37879.93     8.72 700315       2
995450  1            78.30   2007-07-12 16:06:35.507 -37801.63    87.02 700316       3
80621   1            48.84   2007-12-04 02:06:52.910 -37752.79   135.86 700317       4
888163  1             3.15   2008-02-23 19:48:42.240 -12418.92     3.15 800171       1
346168  1            80.03   2008-06-27 18:33:30.930 -12338.89    83.18 800172       2
872577  1             4.95   2009-09-06 19:36:51.310 -29470.02     4.95 900388       1
440362  2            50.50   2000-03-04 19:29:30.027     51.16    50.50      4       1
526756  2             9.94   2000-03-30 07:32:15.007     61.10    60.44      5       2
16661   2            99.86   2003-10-03 08:35:18.557  25145.64    99.86 299831       1
961428  2           -34.11   2004-01-03 03:07:52.567   5059.65   -34.11 399986       1
166651  2            -5.63   2004-02-27 11:11:09.650   5054.02   -39.74 399987       2
580753  2            36.90   2004-03-15 04:19:25.873   5090.92    -2.84 399988       3
847523  2            74.55   2004-05-18 19:07:21.880   5165.47    71.71 399989       4
567215  2            60.69   2005-02-16 04:53:40.067 -20185.49    60.69 500154       1
860649  2            48.57   2005-04-09 15:23:16.717 -20136.92   109.26 500155       2
878520  2            82.58   2005-12-21 16:25:53.147 -20054.34   191.84 500156       3
911473  2            98.32   2007-04-21 11:21:39.040 -37654.47    98.32 700318       1
350919  2           -31.50   2008-02-29 06:15:04.293 -12370.39   -31.50 800173       1
562313  2            21.53   2008-04-20 13:37:58.837 -12348.86    -9.97 800174       2
684082  2           -79.72   2009-06-24 03:03:08.610 -29549.74   -79.72 900389       1

What is interesting to note is that the balances are actually based on the partition that the data resided in when the code was run. You see a group balance for each of the accounts based on year. If you also look closely, you will notice that the running balance itself also is based on the partition the date resided as well. This is obviously not what we are looking for with this code.

If you look at the code I ran, you'll also notice I had set STATISTICS IO ON. The following is the output generated.

Table 'RTTestP'. Scan count 12, logical reads 7248, physical reads 0, read-ahead reads 0, lob logical reads 0
      , lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
   CPU time = 8235 ms,  elapsed time = 8547 ms.
(1000000 row(s) affected)

Looking at that, I wondered what would happen if I made a simple change to the code to force MAXDOP (maximum degree of parallelism) to 1. So, let's see what happened, here is the new code that was run.

USE[RunningTotals]
GO
exec dbo.ResetRTTestP;
go
DBCCFREEPROCCACHE WITH NO_INFOMSGS;
DBCCFREESYSTEMCACHE ( 'ALL' )WITH MARK_IN_USE_FOR_REMOVAL, NO_INFOMSGS;
go
/*
Test the code again against the partitioned table to review the results.
*/declare @StartTime datetime,
        @EndTime datetime;
set @StartTime = getdate();
set statistics io on;
set statistics time on;
declare @AccountID int,
        @RunBal money,
        @GrpBal money,
        @RunCnt int,
        @GrpCnt int;
set @AccountID = 0;
set @RunBal = 0;
set @GrpBal = 0;
set @RunCnt = 0;
set @GrpBal = 0;
update dbo.RTTestP set
    @RunBal = RunBal = @RunBal + rttp.Amount,
    @GrpBal = GrpBal = case when @AccountID <> rttp.AccountID then 0 else @GrpBal end + rttp.Amount,
    @RunCnt = RunCnt = @RunCnt + 1,
    @GrpCnt = GrpCnt = case when @AccountID <> rttp.AccountID then 0 else @GrpCnt end + 1,
    @AccountID = rttp.AccountID
from
    dbo.RTTestP rttp with(index(IX_RTTestP_AccountID_Date))
option (MAXDOP 1);
set statistics io off;
set statistics time off;
set @EndTime = getdate();
select @StartTime, @EndTime, datediff(ms,@StartTime,@EndTime);
select * from dbo.RTTestP orderby AccountID, Date;
go

The results, surprisingly, were the same except the execution time was a little more than 9 seconds. To save space, I'll leave this as an exercise for you, the reader.

Intermediary Table Method

The next thing I had to ask, is how can we accomplish computing the running totals of the data in the partitioned table without resorting to cursors, while loops, or triangular joins. The first option is actually one that Jeff mentioned in his original article in response to such statements as 'I can't modify the table structure or indexing'. Create a temporary non-partitioned work table, copy the data to the work table and run the update, then move the data back to the partitioned table. Here is the code I wrote to accomplish just that.

USE[RunningTotals]
GO
exec dbo.ResetRTTestP;
go
DBCCFREEPROCCACHE WITH NO_INFOMSGS;
DBCCFREESYSTEMCACHE ( 'ALL' )WITH MARK_IN_USE_FOR_REMOVAL, NO_INFOMSGS;
go
/*
Now test the create/update temporary table, update original table method
*/declare @StartTime datetime,
        @EndTime datetime;
set @StartTime = getdate();
set statistics io on;
set statistics time on;
declare @AccountID int,
        @RunBal money,
        @GrpBal money,
        @RunCnt int,
        @GrpCnt int;
set @AccountID = 0;
set @RunBal = 0;
set @GrpBal = 0;
set @RunCnt = 0;
set @GrpBal = 0;
select
    RowNum,
    AccountID,
    Amount,
    Date,
    RunBal,
    GrpBal,
    RunCnt,
    GrpCnt
into
    dbo.RTTestCalc
from
    dbo.RTTestP
order by
    AccountID,
    Date;
create clustered index IX_RTTestCalc on dbo.RTTestCalc (
    AccountID,
    Date);
update dbo.RTTestCalc set
    @RunBal = RunBal = @RunBal + rttc.Amount,
    @GrpBal = GrpBal = case when @AccountID <> rttc.AccountID then 0 else @GrpBal end + rttc.Amount,
    @RunCnt = RunCnt = @RunCnt + 1,
    @GrpCnt = GrpCnt = case when @AccountID <> rttc.AccountID then 0 else @GrpCnt end + 1,
    @AccountID = rttc.AccountID
from
    dbo.RTTestCalc rttc with(index(IX_RTTestCalc));
update dbo.RTTestP set
    RunBal = upd.RunBal,
    GrpBal = upd.GrpBal,
    RunCnt = upd.RunCnt,
    GrpCnt = upd.GrpCnt
from
    dbo.RTTestCalc upd
    inner join dbo.RTTestP rttp
        on(upd.AccountID = rttp.AccountID
            and upd.Date = rttp.Date
            and upd.RowNum = rttp.RowNum);
drop table dbo.RTTestCalc;
set statistics io off;
set statistics time off;
set @EndTime = getdate();
select @StartTime, @EndTime, datediff(ms,@StartTime,@EndTime);
select * from dbo.RTTestP orderby AccountID, Date;
go

This process worked, and it ran about 72 seconds overall. Below is the output from the STATISTICS IO and STATISTICS TIME from the test run.

Table 'RTTestP'. Scan count 12, logical reads 7248, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 10609 ms, elapsed time = 11043 ms.

(1000000 row(s) affected)

SQL Server parse and compile time:

CPU time = 453 ms, elapsed time = 465 ms.

Table 'RTTestCalc'. Scan count 3, logical reads 7044, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 5063 ms, elapsed time = 3309 ms.

SQL Server Execution Times:

CPU time = 5516 ms, elapsed time = 3777 ms.

Table 'RTTestCalc'. Scan count 1, logical reads 7220, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 8140 ms, elapsed time = 8641 ms.

(1000000 row(s) affected)

Table 'RTTestP'. Scan count 3, logical reads 3065503, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'RTTestCalc'. Scan count 3, logical reads 7915, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 2, logical reads 2922488, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 63845 ms, elapsed time = 46880 ms.

(1000000 row(s) affected)

Original Indexed View Method

The question now is, is there another way that could be faster without using the intermediary table? My first thought was to use an updateable view, but a view alone would probably not work, but an indexed view might just be the ticket to solving this problem. So let us start by creating a view and creating a clustered index on the view and see what happens.

Here is the code to create the view and the clustered indexed on the view.

USE[RunningTotals]
GO
/****** Object:  View [dbo].[vRTTestPKeys]    Script Date: 01/04/2009 20:13:58 ******/SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
--===== Create a schema bound view over the partitioned table
CREATE VIEW [dbo].[vRTTestP] (
    RowNum,
    AccountID,
    Amount,
    Date,
    RunBal,
    GrpBal,
    RunCnt,
    GrpCnt
)
WITH SCHEMABINDING
AS
SELECT
    RowNum,
    AccountID,
    Amount,
    Date,
    RunBal,
    GrpBal,
    RunCnt,
    GrpCnt
FROM
    dbo.RTTestP;
GO
USE[RunningTotals]
GO
SET ARITHABORT ON
GO
SET CONCAT_NULL_YIELDS_NULL ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_NULLS ON
GO
SET ANSI_PADDING ON
GO
SET ANSI_WARNINGS ON
GO
SET NUMERIC_ROUNDABORT OFF
GO
/****** Object:  Index [IDX_vRTTestP_AccountID_Date_RowNum]    Script Date: 01/04/2009 20:16:39 ******//*
Create the clustered index on the schema bound view to materialize the data from the table.
*/CREATE UNIQUE CLUSTEREDINDEX [IDX_vRTTestP_AccountID_Date_RowNum] ON [dbo].[vRTTestP]
(
      [AccountID] ASC,
      [Date] ASC,
      [RowNum] ASC
)WITH (
    PAD_INDEX  =OFF,
    STATISTICS_NORECOMPUTE  =OFF,
    SORT_IN_TEMPDB= OFF,
    IGNORE_DUP_KEY= OFF,
    DROP_EXISTING= OFF,
    ONLINE = OFF,
    ALLOW_ROW_LOCKS  =ON,
    ALLOW_PAGE_LOCKS  =ON
) ON [Data];
GO

And here is the test code I ran first.

USE[RunningTotals]
GO
exec dbo.ResetRTTestP;
go
DBCCFREEPROCCACHE WITH NO_INFOMSGS;
DBCCFREESYSTEMCACHE ( 'ALL' )WITH MARK_IN_USE_FOR_REMOVAL, NO_INFOMSGS;
go
/*
Now test the update using the Indexed view joined to partitioned table.
*/declare @StartTime datetime,
        @EndTime datetime;
set @StartTime = getdate();
set statistics io on;
set statistics time on;
declare @AccountID int,
        @RunBal money,
        @GrpBal money,
        @RunCnt int,
        @GrpCnt int;
set @AccountID = 0;
set @RunBal = 0;
set @GrpBal = 0;
set @RunCnt = 0;
set @GrpBal = 0;
update dbo.vRTTestP set
    @RunBal = RunBal = @RunBal + vrttpk.Amount,
    @GrpBal = GrpBal = case when @AccountID <> vrttpk.AccountID then 0 else @GrpBal end + vrttpk.Amount,
    @RunCnt = RunCnt = @RunCnt + 1,
    @GrpCnt = GrpCnt = case when @AccountID <> vrttpk.AccountID then 0 else @GrpCnt end + 1,
    @AccountID = vrttpk.AccountID
from
    dbo.vRTTestP vrttpk with(index(IDX_vRTTestP_AccountID_Date_RowNum))
set statistics io off;
set statistics time off;
set @EndTime = getdate();
select @StartTime, @EndTime, datediff(ms,@StartTime,@EndTime);
select * from dbo.RTTestP orderby AccountID, Date;
go

The first thing I noticed when running this code was the warning I received regarding the index hint I had used in the query, Warning: Index hints supplied for view 'dbo.vRTTestP' will be ignored., SQL Server had ignored the hint. The second was that the result set mirrored the result set of my original query that I ran against the partitioned table, and that it took nearly 45 seconds to run. Much slower than the query run directly against the table itself. The next step was to rerun the query again, but set MAXDOP to 1 and see if that would have any affect on the outcome. Unfortunately, even that didn't help. The obvious reason was that SQL Server was not using the clustered index of the indexed view.

At this point, I began to doubt if there was an alternative method to accomplish this task. I started to review the discussion thread from Jeff's article and explored several alternatives including recursive CTE's and such, but none of those offered a viable solution, and I was about to throw in the towel. Then I had an epiphany while reading through Books Online. There is another index hint that can be used with indexed views, NOEXPAND.

I made a quick change to my code to add the NOEXPAND hint and gave it a shot. Here is the code I then ran.

USE[RunningTotals]
GO
exec dbo.ResetRTTestP;
go
DBCCFREEPROCCACHE WITH NO_INFOMSGS;
DBCCFREESYSTEMCACHE ( 'ALL' )WITH MARK_IN_USE_FOR_REMOVAL, NO_INFOMSGS;
go
/*
Now test the update using the Indexed view joined to partitioned table.
*/declare @StartTime datetime,
        @EndTime datetime;
set @StartTime = getdate();
set statistics io on;
set statistics time on;
declare @AccountID int,
        @RunBal money,
        @GrpBal money,
        @RunCnt int,
        @GrpCnt int;
set @AccountID = 0;
set @RunBal = 0;
set @GrpBal = 0;
set @RunCnt = 0;
set @GrpBal = 0;
update dbo.vRTTestP set
    @RunBal = RunBal = @RunBal + vrttpk.Amount,
    @GrpBal = GrpBal = case when @AccountID <> vrttpk.AccountID then 0 else @GrpBal end + vrttpk.Amount,
    @RunCnt = RunCnt = @RunCnt + 1,
    @GrpCnt = GrpCnt = case when @AccountID <> vrttpk.AccountID then 0 else @GrpCnt end + 1,
    @AccountID = vrttpk.AccountID
from
    dbo.vRTTestP vrttpk with(index(IDX_vRTTestP_AccountID_Date_RowNum), NOEXPAND)
option (MAXDOP 1);
set statistics io off;
set statistics time off;
set @EndTime = getdate();
select @StartTime, @EndTime, datediff(ms,@StartTime,@EndTime);
select * from dbo.RTTestP orderby AccountID, Date;
go

Unfortunately, my hopes were immediately dashed, SQL Server errored out with the following error.

Msg 8171, Level 16, State 1, Line 19

Hint 'noexpand' on object 'dbo.vRTTestP' is invalid.

Modified Indexed View Method

I reread BOL and I was convinced that I was on the right track, just not going the right way. As I thought about it, I decided to try something a little different. Let's try updating the partitioned table directly but join it to the indexed view. Deciding to do it this way, I also decided to modify the indexed view slightly by dropping the columns that are being updated in the table from the view. The following is the code I then used for the indexed view.

USE[RunningTotals]
GO
/****** Object:  View [dbo].[vRTTestPKeys]    Script Date: 01/04/2009 20:13:58 ******/SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
--===== Create a schema bound view over the partitioned table
CREATE VIEW [dbo].[vRTTestP] (
    RowNum,
    AccountID,
    Amount,
    Date
)
WITH SCHEMABINDING
AS
SELECT
    RowNum,
    AccountID,
    Amount,
    Date
FROM
    dbo.RTTestP;
GO
USE[RunningTotals]
GO
SET ARITHABORT ON
GO
SET CONCAT_NULL_YIELDS_NULL ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_NULLS ON
GO
SET ANSI_PADDING ON
GO
SET ANSI_WARNINGS ON
GO
SET NUMERIC_ROUNDABORT OFF
GO
/****** Object:  Index [IDX_vRTTestP_AccountID_Date_RowNum]    Script Date: 01/04/2009 20:16:39 ******//*
Create the clustered index on the schema bound view to materialize the data from the table.
*/CREATE UNIQUE CLUSTEREDINDEX [IDX_vRTTestP_AccountID_Date_RowNum] ON [dbo].[vRTTestP]
(
      [AccountID] ASC,
      [Date] ASC,
      [RowNum] ASC
)WITH (
    PAD_INDEX  =OFF,
    STATISTICS_NORECOMPUTE  =OFF,
    SORT_IN_TEMPDB= OFF,
    IGNORE_DUP_KEY= OFF,
    DROP_EXISTING= OFF,
    ONLINE = OFF,
    ALLOW_ROW_LOCKS  =ON,
    ALLOW_PAGE_LOCKS  =ON
) ON [Data];
GO

Once I had recreated the index view, I modified my code as follows and gave it whirl.

USE[RunningTotals]
GO
exec dbo.ResetRTTestP;
go
DBCCFREEPROCCACHE WITH NO_INFOMSGS;
DBCCFREESYSTEMCACHE ( 'ALL' )WITH MARK_IN_USE_FOR_REMOVAL, NO_INFOMSGS;
go
/*
Now test the update using the Indexed view joined to partitioned table.
*/declare @StartTime datetime,
        @EndTime datetime;
set @StartTime = getdate();
set statistics io on;
set statistics time on;
declare @AccountID int,
        @RunBal money,
        @GrpBal money,
        @RunCnt int,
        @GrpCnt int;
set @AccountID = 0;
set @RunBal = 0;
set @GrpBal = 0;
set @RunCnt = 0;
set @GrpBal = 0;
update dbo.RTTestP set
    @RunBal = RunBal = @RunBal + vrttpk.Amount,
    @GrpBal = GrpBal = case when @AccountID <> vrttpk.AccountID then 0 else @GrpBal end + vrttpk.Amount,
    @RunCnt = RunCnt = @RunCnt + 1,
    @GrpCnt = GrpCnt = case when @AccountID <> vrttpk.AccountID then 0 else @GrpCnt end + 1,
    @AccountID = vrttpk.AccountID
from
    dbo.vRTTestP vrttpk with(index(IDX_vRTTestP_AccountID_Date_RowNum), NOEXPAND)
    inner join dbo.RTTestP rttp
        on(vrttpk.AccountID = rttp.AccountID
            and vrttpk.Date = rttp.Date
            and vrttpk.RowNum = rttp.RowNum);
set statistics io off;
set statistics time off;
set @EndTime = getdate();
select @StartTime, @EndTime, datediff(ms,@StartTime,@EndTime);
select * from dbo.RTTestP orderby AccountID, Date;
go

Unfortunately, it didn't work. Even more interesting is that the results returned were inconsistent between runs. As I thought about it, I decided to replace the OPTION (MAXDOP 1) back in the query (shown below) and see if that solved the problem.

USE[RunningTotals]
GO
exec dbo.ResetRTTestP;
go
DBCCFREEPROCCACHE WITH NO_INFOMSGS;
DBCCFREESYSTEMCACHE ( 'ALL' )WITH MARK_IN_USE_FOR_REMOVAL, NO_INFOMSGS;
go
/*
Now test the update using the Indexed view joined to partitioned table.
*/declare @StartTime datetime,
        @EndTime datetime;
set @StartTime = getdate();
set statistics io on;
set statistics time on;
declare @AccountID int,
        @RunBal money,
        @GrpBal money,
        @RunCnt int,
        @GrpCnt int;
set @AccountID = 0;
set @RunBal = 0;
set @GrpBal = 0;
set @RunCnt = 0;
set @GrpBal = 0;
update dbo.RTTestP set
    @RunBal = RunBal = @RunBal + vrttpk.Amount,
    @GrpBal = GrpBal = case when @AccountID <> vrttpk.AccountID then 0 else @GrpBal end + vrttpk.Amount,
    @RunCnt = RunCnt = @RunCnt + 1,
    @GrpCnt = GrpCnt = case when @AccountID <> vrttpk.AccountID then 0 else @GrpCnt end + 1,
    @AccountID = vrttpk.AccountID
from
    dbo.vRTTestP vrttpk with(index(IDX_vRTTestP_AccountID_Date_RowNum), NOEXPAND)
    inner join dbo.RTTestP rttp
        on(vrttpk.AccountID = rttp.AccountID
            and vrttpk.Date = rttp.Date
            and vrttpk.RowNum = rttp.RowNum)
option (MAXDOP 1);
set statistics io off;
set statistics time off;
set @EndTime = getdate();
select @StartTime, @EndTime, datediff(ms,@StartTime,@EndTime);
select * from dbo.RTTestP orderby AccountID, Date;
go

Amazingly, it works! It ran in just about 30 seconds and properly updated the RunBal, GrpBal, RunCnt, and GrpCnt columns for each record. I reran the process several times, and the results were the same. I had found a way to compute the running totals using an indexed view in the update query. It also ran in half the time as the method using the intermediary table.

Conclusion

We now have two methods for computing running totals for a partitioned table, the intermediary table method, and the inner join to an indexed view (Modified Indexed View). I did learn two things while working on this. One, the indexed view needs to exist before you run the query. I didn't try creating the indexed view in a dynamic query, but I did try creating it on the fly in my scripts, and it was hit or miss if run all together even with using GO to separate the batches. CREATE VIEW has to be the first statement in a batch. The second was that the intermediary table method is actually SLOWER if the indexed view exists. From what I could tell, the query engine would use it when it was there. I ran several tests, and the intermediary table method definitely runs faster when the indexed view has been deleted.

As a final test, I took the code to work with me and ran the same tests on our development server, which has 2 quad-core x64 processors (can't remember if they are Intel or AMD), 8 GB RAM, 5 SCSI disks in a Raid 5 configuration. The intermediary table method ran faster, but was still slower than the index view method, but they were closer than on my desktop system at home.

With that, you should use which ever method fits your situation as both worked well with 1,000,000 row test data used during testing. The real deciding factors will be your environment and your data.

Acknowledgements

I would like to give a lot of thanks to Jeff Moden. His original article was thought provoking and provided me with a base to work from in trying to solve this problem with partitioned tables. He has proven to be an excellent mentor and having gotten to know him through SSC, I can't wait to actually meet him in person one day.

I'd also like to thank everyone who participated in the discussion of Jeff's article. It provided me with other ideas with which to work with, even if they didn't work for me.

Rate

4.71 (24)

You rated this post out of 5. Change rating

Share

Share

Rate

4.71 (24)

You rated this post out of 5. Change rating