Cursor-Killing: Accessing Data in the Next Row

  • Ed Pollack

    Hall of Fame

    Points: 3107

    Comments posted to this topic are about the item Cursor-Killing: Accessing Data in the Next Row

  • Megan Brooks

    Old Hand

    Points: 378

    Yikes, it would never even occur to me to use a cursor in that situation (but I know there are those that do). I have always used a self-join. Pre-2005, in situations where the previous-period key was not a simple arithmetic calculation I would do something like use a correlated subquery to find the most recent period before the current one.

  • Pedro Gomes-379488

    Valued Member

    Points: 54

    It's good to know about LAG and LEAD, that I didn't know of, but I use the approach below when faced with that kind of problem and don't have performance issues:

    SELECT

    T2.InvID, T2.Date, T2.NewLocationID AS LocationID, COUNT(*) AS ItemIdent INTO #Temp

    FROM

    Table AS T1

    INNER JOIN Table AS T2 ON T1.InvID = T2.InvID AND T2.Date >= T1.Date

    GROUP BY T2.InvID, T2.Date, T2.NewLocationID

    SELECT T1.LocationID, DATEDIFF(d, T1.Date, T2.Date) AS QtyDaysInLocation

    FROM #Temp AS T1 LEFT JOIN #Temp AS T2 ON T1.InvID = T2.InvID AND T1.ItemIdent = T2.ItemIdent - 1

  • danielk1

    SSC-Addicted

    Points: 462

    Do not normally direct others to sites from the one the forum is on, this is another good explanation and illustration of LEAD and LAG: http://blog.sqlauthority.com/2011/11/15/sql-server-introduction-to-lead-and-lag-analytic-functions-introduced-in-sql-server-2012/

  • Luis Cazares

    SSC Guru

    Points: 183517

    Pedro Gomes-379488 (5/27/2014)


    It's good to know about LAG and LEAD, that I didn't know of, but I use the approach below when faced with that kind of problem and don't have performance issues:

    SELECT

    T2.InvID, T2.Date, T2.NewLocationID AS LocationID, COUNT(*) AS ItemIdent INTO #Temp

    FROM

    Table AS T1

    INNER JOIN Table AS T2 ON T1.InvID = T2.InvID AND T2.Date >= T1.Date

    GROUP BY T2.InvID, T2.Date, T2.NewLocationID

    SELECT T1.LocationID, DATEDIFF(d, T1.Date, T2.Date) AS QtyDaysInLocation

    FROM #Temp AS T1 LEFT JOIN #Temp AS T2 ON T1.InvID = T2.InvID AND T1.ItemIdent = T2.ItemIdent - 1

    Be careful with this as it's considered hidden RBAR and can be even worse than a cursor. Read the following article on it. http://www.sqlservercentral.com/articles/T-SQL/61539/

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Ed Pollack

    Hall of Fame

    Points: 3107

    There are certainly many ways to accomplish this and end up with the same results---but performance is the key in deciding what to do. LEAD and LAG are very, very efficient---and their value increases if you need other window functions over the same window, as the window itself can be shared across each.

    Definitely test each case to determine how many seeks/scans are being performed and how IO looks---especially if you are upgrading to SQL Server 2012 and have new tools at your disposal that can be applied to older code.

    A correlated subquery or cursor/WHILE solution may be adequate for small data sets, but data can change over time, so caution should be exercised before making changes now that could end up being a hindrance in the future.

  • Luis Cazares

    SSC Guru

    Points: 183517

    I forgot to mention. Thank you for the article Ed. It's nice,I wish that I could use it more, but we haven't migrated to 2012 (or 2014). I have to stay with the ROW_NUMBER() version as it seems to be the best option after the LEAD and LAG option you presented.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Pedro Gomes-379488

    Valued Member

    Points: 54

    Thanks for the advice. My query is a little more controlled than the cases exposed in the article.

    I use an equal join and only >= inside the same ID, which reduces the product.

    Besides, I rarely use subqueries in the SELECT clause, only the in the FROM clause. I know it doesn't make any difference but the former seems to be more keen to getting out of control than the latter. By using subqueries in the FROM clause you get more concious of the cardinalities involved in your joins.

  • Luis Cazares

    SSC Guru

    Points: 183517

    Don't be so sure about the control that you expect by the limit on the ID. The rows are still being generated and you can easily check it with the actual execution plan. With a few thousand rows, the actual number of rows generated can be huge.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • BarneyL

    Ten Centuries

    Points: 1044

    CELKO (5/27/2014)


    I would use a different data model that has a correct temporal interval (start_date, end_date).

    I agree completely Joe. However I find myself using this technique a lot in order to get someone else's data into that correct model.

    You always have to start from where you are rather than where you'd like to be.

  • Caruncles

    Hall of Fame

    Points: 3239

    I look forward to articles like this displaying the new technology. We're on SS 2008 and have no plans to change any time soon. So I don't have a chance to play with the newer stuff.

    I would either have built a temp table with indexes and populated from the parent table, then query the parent table with the temp table using "index +1" or some such logic. As one poster mentioned, I have also used the self-join. However, as was mentioned, the parent table could use a makeover. A "lastupdate" field being fired from an update trigger sounds like a good idea to me.

    Thanx!

    Wallace Houston
    Sunnyland Farms, Inc.

    "We must endeavor to persevere."

  • asol002

    Grasshopper

    Points: 20

    Window functions are great and work well when you want to access records related in some way to every record. The only problem with LEAD and LAG and window functions is they can only access records within your result set. When you need to query only certain records but want access many different other records somehow related to those records. I like Outer Apply. The performance difference you see with Joins and Outer Applies is due to proper indexing.

  • Robert Sterbal

    SSChampion

    Points: 10951

    Did you also compile a table of results based on

    10,000

    1,000,000

    and other counts of data?

    412-977-3526 call/text

  • Ed Pollack

    Hall of Fame

    Points: 3107

    These specific examples were on smaller data sets---specifically taken from a question asked online prior to this article's original publication. I've tested window functions on other data sets (anywhere from a few rows to billions), and performance should be tailored to the expected data growth in the future, whatever that happens to be.

    Window functions are very convenient, but I find it important to test thoroughly as they can be performance hogs if the range being windowed is large enough (as that will incur a scan across whatever range is being queried for, or a seek if an index happens to cover everything involved).

    robert.sterbal 56890 (12/18/2015)


    Did you also compile a table of results based on

    10,000

    1,000,000

    and other counts of data?

  • asol002

    Grasshopper

    Points: 20

    Window functions are great and work well when you want to access records related in some way to current records. The only problem with LEAD and LAG and window functions is they can only access records within your result set. When you need data from only a few select Records or want access many different other records somehow related to those records or these other records are spread out through what would be a large result set. I favor using derived tables correlated with an Outer Apply. The performance difference you see with the Joins and Outer Apply methods here is usually due to proper indexing.

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply