Log in
::
Register
::
Not logged in
Home
Tags
Articles
Editorials
Stairways
Forums
Scripts
Videos
Blogs
QotD
Books
Ask SSC
SQL Jobs
Training
Authors
About us
Contact us
Newsletters
Write for us
Recent Posts
Recent Posts
Popular Topics
Popular Topics
Home
Search
Members
Calendar
Who's On
Home
»
Article Discussions
»
Article Discussions by Author
»
Discuss content posted by Lynn Pettis
»
Solving the “Running Totals” Problem in SQL...
41 posts, Page 1 of 5
1
2
3
4
5
»
»»
Solving the “Running Totals” Problem in SQL Server 2005 with Partitioned Tables
Rate Topic
Display Mode
Topic Options
Author
Message
Lynn Pettis
Lynn Pettis
Posted Monday, January 26, 2009 10:03 PM
SSC-Insane
Group: General Forum Members
Last Login: Today @ 2:49 PM
Points: 21,612,
Visits: 27,443
Comments posted to this topic are about the item
Solving the “Running Totals” Problem in SQL Server 2005 with Partitioned Tables
Lynn Pettis
For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here
or
when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here
and
here
Managing Transaction Logs
SQL Musings from the Desert
Fountain Valley SQL
(My Mirror Blog)
Post #643896
John Walker-229141
John Walker-229141
Posted Tuesday, January 27, 2009 3:19 AM
Forum Newbie
Group: General Forum Members
Last Login: Thursday, May 07, 2009 6:48 AM
Points: 3,
Visits: 32
I think Jeff hinted that he was going to eventually get to the analytics solution and far be it from me to steal his thunder, but I couldn't help wondering if either I have misunderstood the scale of the problem or the solutions displayed thus far have been grossly convoluted. From my understanding of the problem at its simplest, we would like to compute a running total (leaving any notion of partitioning aside for the moment). Surely all that is needed is something like this:
select AccountId,
Date,
Amount,
sum(Amount) over (partition by AccountId, order by Date) as "Running_Balance"
from JBMTest
order by AccountId, Date
Now, partitioning. What has that got to do with anything? Again, from our simple perspective we don't care. Let our optimizer handle such details. As the query stands, we're looking to cover the entire table anyway. Given that our partitions are based on date, if we restrict to one year, we touch only one partition - but that is not something we need to explicitly state in a hint or some such construct, surely.
Post #644012
peter-757102
peter-757102
Posted Tuesday, January 27, 2009 5:58 AM
SSC Veteran
Group: General Forum Members
Last Login: Today @ 3:57 AM
Points: 288,
Visits: 1,903
Not having access to a developer or Enterprise version of SQL Server I find this topic hard to go into myself. Some of the SQL features used are only available on those versions for those wondering why I bring it up. Still I did quickly read trough the essential parts that make up the solution and the path followed to get there and got a bit of a bad feeling after reading it.
Searching for working methods is one thing, but what bugs me in this case is that the final output depends on query hints like OPTION (MAXDOP 1) that technically should not affect the output at all. The solution is relying on SQL Server implementation details that are unspecified by microsoft and are likely to change over time.
Even when implementation details do not change, maintainability is at risk because the technique breaks with the very useful concept of SQL itself. You ask a question and specify
what information
you want as output,
in what order
. It is then up to the database to deliver according to your specifications as best as it can, isolating you from the details on
how
it does it.
Imagine using the described technique and 4 years from now you move to a 4 processor, 12 core system and solid state disks. There turns out to be unexpected low performance and an expert is hired to see what is wrong. First thing that would get the boot would be the OPTION (MAXDOP 1). It would speed the result significantly and since such a hint SHOULD NEVER affect output it is unlikely to be noticed right away.
You see where I am going with this....it feels like playing dice with your data!
Post #644063
Lynn Pettis
Lynn Pettis
Posted Tuesday, January 27, 2009 6:39 AM
SSC-Insane
Group: General Forum Members
Last Login: Today @ 2:49 PM
Points: 21,612,
Visits: 27,443
John Walker (1/27/2009)
I think Jeff hinted that he was going to eventually get to the analytics solution and far be it from me to steal his thunder, but I couldn't help wondering if either I have misunderstood the scale of the problem or the solutions displayed thus far have been grossly convoluted. From my understanding of the problem at its simplest, we would like to compute a running total (leaving any notion of partitioning aside for the moment). Surely all that is needed is something like this:
select AccountId,
Date,
Amount,
sum(Amount) over (partition by AccountId, order by Date) as "Running_Balance"
from JBMTest
order by AccountId, Date
Now, partitioning. What has that got to do with anything? Again, from our simple perspective we don't care. Let our optimizer handle such details. As the query stands, we're looking to cover the entire table anyway. Given that our partitions are based on date, if we restrict to one year, we touch only one partition - but that is not something we need to explicitly state in a hint or some such construct, surely.
Unfortunately, at this time, the window functions in SQL Server are limited and won't give you the answers that you need to answer the running total question. The row_number() function can be used in SQL Server 2005/2008 to handle the ranking problem, but that still leaves running balanaces. Until Microsoft fully implements the windowing functions in SQL Server, and you are looking to implement proformant code to meet this need, you will need to look at methods like this that utilize the database engine to its fullest.
In addition, this is but an example that happens to use date for the partitioning. What if the partioning were done based on account ranges, for example, because the application design merited such a design. Please remember, the purpose of the article was to demonstrate a technique that could be used to help solve a problem.
Lynn Pettis
For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here
or
when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here
and
here
Managing Transaction Logs
SQL Musings from the Desert
Fountain Valley SQL
(My Mirror Blog)
Post #644087
Lynn Pettis
Lynn Pettis
Posted Tuesday, January 27, 2009 6:49 AM
SSC-Insane
Group: General Forum Members
Last Login: Today @ 2:49 PM
Points: 21,612,
Visits: 27,443
peter (1/27/2009)
Not having access to a developer or Enterprise version of SQL Server I find this topic hard to go into myself. Some of the SQL features used are only available on those versions for those wondering why I bring it up. Still I did quickly read trough the essential parts that make up the solution and the path followed to get there and got a bit of a bad feeling after reading it.
Searching for working methods is one thing, but what bugs me in this case is that the final output depends on query hints like OPTION (MAXDOP 1) that technically should not affect the output at all. The solution is relying on SQL Server implementation details that are unspecified by microsoft and are likely to change over time.
Even when implementation details do not change, maintainability is at risk because the technique breaks with the very useful concept of SQL itself. You ask a question and specify
what information
you want as output,
in what order
. It is then up to the database to deliver according to your specifications as best as it can, isolating you from the details on
how
it does it.
Imagine using the described technique and 4 years from now you move to a 4 processor, 12 core system and solid state disks. There turns out to be unexpected low performance and an expert is hired to see what is wrong. First thing that would get the boot would be the OPTION (MAXDOP 1). It would speed the result significantly and since such a hint SHOULD NEVER affect output it is unlikely to be noticed right away.
You see where I am going with this....it feels like playing dice with your data!
I understand your concern, however, part of it can be solved by one simple thing and that is documentation. If this technique is used in a production system, the use of the OPTION (MAXDOP 1) should be well commented so that someone else coming along later would understand why it was being used and would not remove it, thereby breaking the process.
I wish I had a true single processor system (no hyper threading or multiple cores) to have tested this on to see how it worked. Would I have thought that this worked without MAXDOP 1 in that environment? If it did, what would my reaction have been when I moved it to multi-processor, multi-core environment and it suddenly failed (remember, I did a final test in such an environment)?
Maybe someone out there has a true single processor system and access to SQL Server 2005 Developer Edition and they will take the code from this article and run these tests and let us know the results of that testing.
Lynn Pettis
For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here
or
when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here
and
here
Managing Transaction Logs
SQL Musings from the Desert
Fountain Valley SQL
(My Mirror Blog)
Post #644094
Smendle
Smendle
Posted Tuesday, January 27, 2009 7:10 AM
SSC Rookie
Group: General Forum Members
Last Login: Monday, April 29, 2013 2:59 PM
Points: 39,
Visits: 255
I wish I had a true single processor system (no hyper threading or multiple cores) to have tested this on to see how it worked. Would I have thought that this worked without MAXDOP 1 in that environment? If it did, what would my reaction have been when I moved it to multi-processor, multi-core environment and it suddenly failed (remember, I did a final test in such an environment)?
Oh but you do! Go into your BIOS and turn that stuff off!
Post #644109
Lynn Pettis
Lynn Pettis
Posted Tuesday, January 27, 2009 7:15 AM
SSC-Insane
Group: General Forum Members
Last Login: Today @ 2:49 PM
Points: 21,612,
Visits: 27,443
Smendle (1/27/2009)
I wish I had a true single processor system (no hyper threading or multiple cores) to have tested this on to see how it worked. Would I have thought that this worked without MAXDOP 1 in that environment? If it did, what would my reaction have been when I moved it to multi-processor, multi-core environment and it suddenly failed (remember, I did a final test in such an environment)?
Oh but you do! Go into your BIOS and turn that stuff off!
Maybe, but I'm a software geek, not a hardware geek. I'm not exactly sure how to go about doing that, and I'd be worried that I'd break something (and I'm not joking on that).
Lynn Pettis
For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here
or
when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here
and
here
Managing Transaction Logs
SQL Musings from the Desert
Fountain Valley SQL
(My Mirror Blog)
Post #644113
noeld
noeld
Posted Tuesday, January 27, 2009 8:37 AM
SSCertifiable
Group: General Forum Members
Last Login: Tuesday, May 14, 2013 4:39 PM
Points: 6,260,
Visits: 1,977
I liked your article ( a lot ) and it seems that you put a lot of effort into it.
My view of running totals is simple. It is a CLIENT-SIDE responsibility.
Although it may provide you with an alternative to deliver to the developers what they may want out of the DB
IMHO looping through an ordered result set in the client is by far more scalable and "faster" end-to-end from the User point of view.
Just my $0.02
* Noel
Post #644203
Lynn Pettis
Lynn Pettis
Posted Tuesday, January 27, 2009 8:59 AM
SSC-Insane
Group: General Forum Members
Last Login: Today @ 2:49 PM
Points: 21,612,
Visits: 27,443
noeld (1/27/2009)
I liked your article ( a lot ) and it seems that you put a lot of effort into it.
My view of running totals is simple. It is a CLIENT-SIDE responsibility.
Although it may provide you with an alternative to deliver to the developers what they may want out of the DB
IMHO looping through an ordered result set in the client is by far more scalable and "faster" end-to-end from the User point of view.
Just my $0.02
I'll agree, which is why I had that in the disclaimer at the beginning of the article. What I was really looking at was a process to solve a problem. As I have found many times, prinicples are more important than details. The running total problem with partitioned tables offered a problem and hopefully this article provides a principle that can be applied to similiar problems that may not be a client-side responsibility.
An accounting professor once told me he rather teach 10 principles rather than 10 detailed tasks. With 10 principles you complete 100 tasks.
Lynn Pettis
For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here
or
when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here
and
here
Managing Transaction Logs
SQL Musings from the Desert
Fountain Valley SQL
(My Mirror Blog)
Post #644238
Lynn Pettis
Lynn Pettis
Posted Tuesday, January 27, 2009 9:01 AM
SSC-Insane
Group: General Forum Members
Last Login: Today @ 2:49 PM
Points: 21,612,
Visits: 27,443
I received a private message with a question regarding my article that I can't answer, so I thought I'd put it here with the hopes that some of the gurus (Jeff, Gail, Grant, Gus, et al) out there may be able to help out.
Why is the CPU time greater than the elapsed time
CPU time = 5063 ms, elapsed time = 3309 ms.
Lynn Pettis
For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here
or
when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here
and
here
Managing Transaction Logs
SQL Musings from the Desert
Fountain Valley SQL
(My Mirror Blog)
Post #644240
« Prev Topic
|
Next Topic »
41 posts, Page 1 of 5
1
2
3
4
5
»
»»
Permissions
You
cannot
post new topics.
You
cannot
post topic replies.
You
cannot
post new polls.
You
cannot
post replies to polls.
You
cannot
edit your own topics.
You
cannot
delete your own topics.
You
cannot
edit other topics.
You
cannot
delete other topics.
You
cannot
edit your own posts.
You
cannot
edit other posts.
You
cannot
delete your own posts.
You
cannot
delete other posts.
You
cannot
post events.
You
cannot
edit your own events.
You
cannot
edit other events.
You
cannot
delete your own events.
You
cannot
delete other events.
You
cannot
send private messages.
You
cannot
send emails.
You
may
read topics.
You
cannot
rate topics.
You
cannot
vote within polls.
You
cannot
upload attachments.
You
may
download attachments.
You
cannot
post HTML code.
You
cannot
edit HTML code.
You
cannot
post IFCode.
You
cannot
post JavaScript.
You
cannot
post EmotIcons.
You
cannot
post or upload images.
Copyright © 2002-2013 Simple Talk Publishing. All Rights Reserved.
Privacy Policy.
Terms of Use.
Report Abuse.