Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12345»»»

Solving the “Running Totals” Problem in SQL Server 2005 with Partitioned Tables Expand / Collapse
Author
Message
Posted Monday, January 26, 2009 10:03 PM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Today @ 4:02 PM
Points: 22,504, Visits: 30,215
Comments posted to this topic are about the item Solving the “Running Totals” Problem in SQL Server 2005 with Partitioned Tables


Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
Post #643896
Posted Tuesday, January 27, 2009 3:19 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, May 07, 2009 6:48 AM
Points: 3, Visits: 32
I think Jeff hinted that he was going to eventually get to the analytics solution and far be it from me to steal his thunder, but I couldn't help wondering if either I have misunderstood the scale of the problem or the solutions displayed thus far have been grossly convoluted. From my understanding of the problem at its simplest, we would like to compute a running total (leaving any notion of partitioning aside for the moment). Surely all that is needed is something like this:

select AccountId,
Date,
Amount,
sum(Amount) over (partition by AccountId, order by Date) as "Running_Balance"
from JBMTest
order by AccountId, Date

Now, partitioning. What has that got to do with anything? Again, from our simple perspective we don't care. Let our optimizer handle such details. As the query stands, we're looking to cover the entire table anyway. Given that our partitions are based on date, if we restrict to one year, we touch only one partition - but that is not something we need to explicitly state in a hint or some such construct, surely.
Post #644012
Posted Tuesday, January 27, 2009 5:58 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Wednesday, April 16, 2014 3:35 AM
Points: 319, Visits: 2,151
Not having access to a developer or Enterprise version of SQL Server I find this topic hard to go into myself. Some of the SQL features used are only available on those versions for those wondering why I bring it up. Still I did quickly read trough the essential parts that make up the solution and the path followed to get there and got a bit of a bad feeling after reading it.

Searching for working methods is one thing, but what bugs me in this case is that the final output depends on query hints like OPTION (MAXDOP 1) that technically should not affect the output at all. The solution is relying on SQL Server implementation details that are unspecified by microsoft and are likely to change over time.

Even when implementation details do not change, maintainability is at risk because the technique breaks with the very useful concept of SQL itself. You ask a question and specify what information you want as output, in what order. It is then up to the database to deliver according to your specifications as best as it can, isolating you from the details on how it does it.

Imagine using the described technique and 4 years from now you move to a 4 processor, 12 core system and solid state disks. There turns out to be unexpected low performance and an expert is hired to see what is wrong. First thing that would get the boot would be the OPTION (MAXDOP 1). It would speed the result significantly and since such a hint SHOULD NEVER affect output it is unlikely to be noticed right away.

You see where I am going with this....it feels like playing dice with your data!
Post #644063
Posted Tuesday, January 27, 2009 6:39 AM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Today @ 4:02 PM
Points: 22,504, Visits: 30,215
John Walker (1/27/2009)
I think Jeff hinted that he was going to eventually get to the analytics solution and far be it from me to steal his thunder, but I couldn't help wondering if either I have misunderstood the scale of the problem or the solutions displayed thus far have been grossly convoluted. From my understanding of the problem at its simplest, we would like to compute a running total (leaving any notion of partitioning aside for the moment). Surely all that is needed is something like this:

select AccountId,
Date,
Amount,
sum(Amount) over (partition by AccountId, order by Date) as "Running_Balance"
from JBMTest
order by AccountId, Date

Now, partitioning. What has that got to do with anything? Again, from our simple perspective we don't care. Let our optimizer handle such details. As the query stands, we're looking to cover the entire table anyway. Given that our partitions are based on date, if we restrict to one year, we touch only one partition - but that is not something we need to explicitly state in a hint or some such construct, surely.


Unfortunately, at this time, the window functions in SQL Server are limited and won't give you the answers that you need to answer the running total question. The row_number() function can be used in SQL Server 2005/2008 to handle the ranking problem, but that still leaves running balanaces. Until Microsoft fully implements the windowing functions in SQL Server, and you are looking to implement proformant code to meet this need, you will need to look at methods like this that utilize the database engine to its fullest.

In addition, this is but an example that happens to use date for the partitioning. What if the partioning were done based on account ranges, for example, because the application design merited such a design. Please remember, the purpose of the article was to demonstrate a technique that could be used to help solve a problem.



Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
Post #644087
Posted Tuesday, January 27, 2009 6:49 AM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Today @ 4:02 PM
Points: 22,504, Visits: 30,215
peter (1/27/2009)
Not having access to a developer or Enterprise version of SQL Server I find this topic hard to go into myself. Some of the SQL features used are only available on those versions for those wondering why I bring it up. Still I did quickly read trough the essential parts that make up the solution and the path followed to get there and got a bit of a bad feeling after reading it.

Searching for working methods is one thing, but what bugs me in this case is that the final output depends on query hints like OPTION (MAXDOP 1) that technically should not affect the output at all. The solution is relying on SQL Server implementation details that are unspecified by microsoft and are likely to change over time.

Even when implementation details do not change, maintainability is at risk because the technique breaks with the very useful concept of SQL itself. You ask a question and specify what information you want as output, in what order. It is then up to the database to deliver according to your specifications as best as it can, isolating you from the details on how it does it.

Imagine using the described technique and 4 years from now you move to a 4 processor, 12 core system and solid state disks. There turns out to be unexpected low performance and an expert is hired to see what is wrong. First thing that would get the boot would be the OPTION (MAXDOP 1). It would speed the result significantly and since such a hint SHOULD NEVER affect output it is unlikely to be noticed right away.

You see where I am going with this....it feels like playing dice with your data!


I understand your concern, however, part of it can be solved by one simple thing and that is documentation. If this technique is used in a production system, the use of the OPTION (MAXDOP 1) should be well commented so that someone else coming along later would understand why it was being used and would not remove it, thereby breaking the process.

I wish I had a true single processor system (no hyper threading or multiple cores) to have tested this on to see how it worked. Would I have thought that this worked without MAXDOP 1 in that environment? If it did, what would my reaction have been when I moved it to multi-processor, multi-core environment and it suddenly failed (remember, I did a final test in such an environment)?

Maybe someone out there has a true single processor system and access to SQL Server 2005 Developer Edition and they will take the code from this article and run these tests and let us know the results of that testing.



Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
Post #644094
Posted Tuesday, January 27, 2009 7:10 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Tuesday, December 03, 2013 5:58 AM
Points: 40, Visits: 260
I wish I had a true single processor system (no hyper threading or multiple cores) to have tested this on to see how it worked. Would I have thought that this worked without MAXDOP 1 in that environment? If it did, what would my reaction have been when I moved it to multi-processor, multi-core environment and it suddenly failed (remember, I did a final test in such an environment)?


Oh but you do! Go into your BIOS and turn that stuff off!
Post #644109
Posted Tuesday, January 27, 2009 7:15 AM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Today @ 4:02 PM
Points: 22,504, Visits: 30,215
Smendle (1/27/2009)
I wish I had a true single processor system (no hyper threading or multiple cores) to have tested this on to see how it worked. Would I have thought that this worked without MAXDOP 1 in that environment? If it did, what would my reaction have been when I moved it to multi-processor, multi-core environment and it suddenly failed (remember, I did a final test in such an environment)?


Oh but you do! Go into your BIOS and turn that stuff off!


Maybe, but I'm a software geek, not a hardware geek. I'm not exactly sure how to go about doing that, and I'd be worried that I'd break something (and I'm not joking on that).




Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
Post #644113
Posted Tuesday, January 27, 2009 8:37 AM
SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Tuesday, April 01, 2014 8:03 PM
Points: 6,266, Visits: 2,027
I liked your article ( a lot ) and it seems that you put a lot of effort into it.

My view of running totals is simple. It is a CLIENT-SIDE responsibility.

Although it may provide you with an alternative to deliver to the developers what they may want out of the DB
IMHO looping through an ordered result set in the client is by far more scalable and "faster" end-to-end from the User point of view.

Just my $0.02




* Noel
Post #644203
Posted Tuesday, January 27, 2009 8:59 AM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Today @ 4:02 PM
Points: 22,504, Visits: 30,215
noeld (1/27/2009)
I liked your article ( a lot ) and it seems that you put a lot of effort into it.

My view of running totals is simple. It is a CLIENT-SIDE responsibility.

Although it may provide you with an alternative to deliver to the developers what they may want out of the DB
IMHO looping through an ordered result set in the client is by far more scalable and "faster" end-to-end from the User point of view.

Just my $0.02



I'll agree, which is why I had that in the disclaimer at the beginning of the article. What I was really looking at was a process to solve a problem. As I have found many times, prinicples are more important than details. The running total problem with partitioned tables offered a problem and hopefully this article provides a principle that can be applied to similiar problems that may not be a client-side responsibility.

An accounting professor once told me he rather teach 10 principles rather than 10 detailed tasks. With 10 principles you complete 100 tasks.




Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
Post #644238
Posted Tuesday, January 27, 2009 9:01 AM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Today @ 4:02 PM
Points: 22,504, Visits: 30,215
I received a private message with a question regarding my article that I can't answer, so I thought I'd put it here with the hopes that some of the gurus (Jeff, Gail, Grant, Gus, et al) out there may be able to help out.


Why is the CPU time greater than the elapsed time

CPU time = 5063 ms, elapsed time = 3309 ms.




Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
Post #644240
« Prev Topic | Next Topic »

Add to briefcase 12345»»»

Permissions Expand / Collapse