Cursor-Killing: Accessing Data in the Next Row

Question

Cursor-Killing: Accessing Data in the Next Row

Ed Pollack

Hall of Fame

Points: 3199
More actions
May 26, 2014 at 9:43 pm

#291670

Comments posted to this topic are about the item Cursor-Killing: Accessing Data in the Next Row

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply

Megan Brooks Old Hand Points: 378 More actions · Answer 1

Yikes, it would never even occur to me to use a cursor in that situation (but I know there are those that do). I have always used a self-join. Pre-2005, in situations where the previous-period key was not a simple arithmetic calculation I would do something like use a correlated subquery to find the most recent period before the current one.

Pedro Gomes-379488 Valued Member Points: 54 More actions · Answer 2

It's good to know about LAG and LEAD, that I didn't know of, but I use the approach below when faced with that kind of problem and don't have performance issues:

SELECT

T2.InvID, T2.Date, T2.NewLocationID AS LocationID, COUNT(*) AS ItemIdent INTO #Temp

FROM

Table AS T1

INNER JOIN Table AS T2 ON T1.InvID = T2.InvID AND T2.Date >= T1.Date

GROUP BY T2.InvID, T2.Date, T2.NewLocationID

SELECT T1.LocationID, DATEDIFF(d, T1.Date, T2.Date) AS QtyDaysInLocation

FROM #Temp AS T1 LEFT JOIN #Temp AS T2 ON T1.InvID = T2.InvID AND T1.ItemIdent = T2.ItemIdent - 1

danielk1 SSC-Addicted Points: 462 More actions · Answer 3

Do not normally direct others to sites from the one the forum is on, this is another good explanation and illustration of LEAD and LAG: http://blog.sqlauthority.com/2011/11/15/sql-server-introduction-to-lead-and-lag-analytic-functions-introduced-in-sql-server-2012/

Luis Cazares SSC Guru Points: 183698 More actions · Answer 4

Pedro Gomes-379488 (5/27/2014)
It's good to know about LAG and LEAD, that I didn't know of, but I use the approach below when faced with that kind of problem and don't have performance issues:
SELECT
T2.InvID, T2.Date, T2.NewLocationID AS LocationID, COUNT(*) AS ItemIdent INTO #Temp
FROM
Table AS T1
INNER JOIN Table AS T2 ON T1.InvID = T2.InvID AND T2.Date >= T1.Date
GROUP BY T2.InvID, T2.Date, T2.NewLocationID
SELECT T1.LocationID, DATEDIFF(d, T1.Date, T2.Date) AS QtyDaysInLocation
FROM #Temp AS T1 LEFT JOIN #Temp AS T2 ON T1.InvID = T2.InvID AND T1.ItemIdent = T2.ItemIdent - 1

Be careful with this as it's considered hidden RBAR and can be even worse than a cursor. Read the following article on it. http://www.sqlservercentral.com/articles/T-SQL/61539/

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Ed Pollack Hall of Fame Points: 3199 More actions · Answer 5

There are certainly many ways to accomplish this and end up with the same results---but performance is the key in deciding what to do. LEAD and LAG are very, very efficient---and their value increases if you need other window functions over the same window, as the window itself can be shared across each.

Definitely test each case to determine how many seeks/scans are being performed and how IO looks---especially if you are upgrading to SQL Server 2012 and have new tools at your disposal that can be applied to older code.

A correlated subquery or cursor/WHILE solution may be adequate for small data sets, but data can change over time, so caution should be exercised before making changes now that could end up being a hindrance in the future.

Luis Cazares SSC Guru Points: 183698 More actions · Answer 6

I forgot to mention. Thank you for the article Ed. It's nice,I wish that I could use it more, but we haven't migrated to 2012 (or 2014). I have to stay with the ROW_NUMBER() version as it seems to be the best option after the LEAD and LAG option you presented.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Pedro Gomes-379488 Valued Member Points: 54 More actions · Answer 7

Thanks for the advice. My query is a little more controlled than the cases exposed in the article.

I use an equal join and only >= inside the same ID, which reduces the product.

Besides, I rarely use subqueries in the SELECT clause, only the in the FROM clause. I know it doesn't make any difference but the former seems to be more keen to getting out of control than the latter. By using subqueries in the FROM clause you get more concious of the cardinalities involved in your joins.

Luis Cazares SSC Guru Points: 183698 More actions · Answer 8

Don't be so sure about the control that you expect by the limit on the ID. The rows are still being generated and you can easily check it with the actual execution plan. With a few thousand rows, the actual number of rows generated can be huge.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

BarneyL Ten Centuries Points: 1044 More actions · Answer 9

CELKO (5/27/2014)
I would use a different data model that has a correct temporal interval (start_date, end_date).

I agree completely Joe. However I find myself using this technique a lot in order to get someone else's data into that correct model.

You always have to start from where you are rather than where you'd like to be.

Caruncles Hall of Fame Points: 3299 More actions · Answer 10

I look forward to articles like this displaying the new technology. We're on SS 2008 and have no plans to change any time soon. So I don't have a chance to play with the newer stuff.

I would either have built a temp table with indexes and populated from the parent table, then query the parent table with the temp table using "index +1" or some such logic. As one poster mentioned, I have also used the self-join. However, as was mentioned, the parent table could use a makeover. A "lastupdate" field being fired from an update trigger sounds like a good idea to me.

Thanx!

Wallace Houston
Sunnyland Farms, Inc.

"We must endeavor to persevere."

asol002 Grasshopper Points: 20 More actions · Answer 11

Window functions are great and work well when you want to access records related in some way to every record. The only problem with LEAD and LAG and window functions is they can only access records within your result set. When you need to query only certain records but want access many different other records somehow related to those records. I like Outer Apply. The performance difference you see with Joins and Outer Applies is due to proper indexing.

Robert Sterbal SSChampion Points: 11044 More actions · Answer 12

Did you also compile a table of results based on

10,000

1,000,000

and other counts of data?

412-977-3526 call/text

Ed Pollack Hall of Fame Points: 3199 More actions · Answer 13

These specific examples were on smaller data sets---specifically taken from a question asked online prior to this article's original publication. I've tested window functions on other data sets (anywhere from a few rows to billions), and performance should be tailored to the expected data growth in the future, whatever that happens to be.

Window functions are very convenient, but I find it important to test thoroughly as they can be performance hogs if the range being windowed is large enough (as that will incur a scan across whatever range is being queried for, or a seek if an index happens to cover everything involved).

robert.sterbal 56890 (12/18/2015)
Did you also compile a table of results based on
10,000
1,000,000
and other counts of data?

asol002 Grasshopper Points: 20 More actions · Answer 14

Window functions are great and work well when you want to access records related in some way to current records. The only problem with LEAD and LAG and window functions is they can only access records within your result set. When you need data from only a few select Records or want access many different other records somehow related to those records or these other records are spread out through what would be a large result set. I favor using derived tables correlated with an Outer Apply. The performance difference you see with the Joins and Outer Apply methods here is usually due to proper indexing.