Blog Post

Analytic Functions – LAG / LEAD

,

This post is part of the series discussing the new Analytic functions in SQL Server “Denali”, CTP3.

Analytic Functions in SQL Server “Denali”

All of the new Analytic functions require the use of the OVER clause. To see what changes to the OVER clause that “Denali” brings, please view my article The OVER Clause enhancements in SQL Server code named “Denali”, CTP3 on SQLServerCentral.com.

Introducing the LAG / LEAD functions

Starting the Analytic Functions series off are the LAG and LEAD functions. The LAG/LEAD functions return a column from a previous/following row in the partition, with respect to the current row, as specified by the row offset in the function, without the use of a self-join. The syntax for these functions is:

LAG|LEAD (scalar_expression [,offset] [,default])
    OVER ( [ partition_by_clause ] order_by_clause )

Note that the partition by clause is optional, the order by clause is required, and the windowing clause (ROWS|RANGE) is not supported.

The scalar_expression is the value to be returned – this will normally be a column, but it can also be a subquery or any other expression that results in a single value.

The offset is the number of previous (LAG) or following (LEAD) rows from the current row from which to obtain a value, and if not specified it will default to 1.

The default is the value to use if the value at the offset is NULL, and if not specified it will default to NULL.

Using the LEAD function

An example for using LEAD would be to determine the gaps of a numeric column. For the following table, we want to determine the starting and ending range of values for each gap:

DECLARE @Gaps TABLE (col1 int PRIMARY KEY CLUSTERED);
 
INSERT INTO @Gaps (col1)
VALUES (1), (2), (3),
       (50), (51), (52), (53), (54), (55),
       (100), (101), (102),
       (500),
       (950), (951), (952);

The gaps are determined by comparing col1 in the current row to col1 in the next row, and if the difference is > 1 then there is a gap. For each row that has a gap, the next row’s value -1 is the end of the gap, and the current row’s value +1 is the start of the gap. Since we want to compare the current row to the next row, we’ll use the LEAD function. The code for this query is:

WITH cte AS
(
SELECT curr = col1,
       nxt = LEAD(col1, 1, NULL)
             OVER (ORDER BY col1)
  FROM @Gaps
)
SELECT [Start of Gap] = cte.curr + 1,
       [End of Gap] = cte.nxt - 1
  FROM cte
 WHERE cte.nxt- cte.curr > 1;

and this query produces the following results:

Start of Gap End of Gap
------------ -----------
4            49
56           99
103          499
501          949

Okay, that’s one example. Let’s look at another.

Using the LAG function

Let’s say that we have RFID tags attached to different items. Sensors periodically determine the location of each tag, and store the item, the location where it was picked up at, and a timestamp of when the sensor picked up the item. The following code creates a sample table, and then builds 100 rows of data – 10 items for being at each of 10 locations, with a random timestamp. The code then replaces all occurrences of rows 3, 6 and 9 for each item (as ordered by the timestamp) with the previous rows location value – giving us for each item some rows that are changing location, and some rows that are in the same location.

DECLARE @Locations TABLE (
  ItemId INT,
  LocationId INT,
  TrackDateTime DATETIME);
 
-- build 100 rows of data. 10 items * 10 locations,
-- and each row has a random date.
WITH
Tens (N) AS (SELECT 1 UNION ALL SELECT 2 UNION ALL
             SELECT 3 UNION ALL SELECT 4 UNION ALL
             SELECT 5 UNION ALL SELECT 6 UNION ALL
             SELECT 7 UNION ALL SELECT 8 UNION ALL
             SELECT 9 UNION ALL SELECT 10 )
INSERT INTO @Locations (ItemId, LocationId, TrackDateTime)
SELECT t1.N,
       t2.N,
       CAST(ABS(CHECKSUM(NEWID()))%3653+36524 AS DATETIME)
  FROM Tens t1
       CROSS JOIN Tens t2;
 
WITH CTE AS
(
SELECT ItemId, LocationId, TrackDateTime,
       PriorLocation = LAG(LocationId)
                       OVER (PARTITION BY ItemId
                                 ORDER BY TrackDateTime),
       RN = ROW_NUMBER()
            OVER (PARTITION BY ItemId
                      ORDER BY TrackDateTime)
  FROM @Locations
)
UPDATE CTE
   SET LocationId = PriorLocation
 WHERE RN%3=0;

Here the LAG function was utilized to get the previous row’s Location within each Item, where the previous row is determined by its timestamp value. It’s already been a helpful function, and I still have the rest of the code to show!

Using both the LAG and LEAD functions

Now let’s see a lists of items that did not move – or where the current location and the previous location are the same. And for good measure, let’s look at where the item next moved to.

WITH CTE AS
(
SELECT *,
       PriorLocation = LAG(LocationId)
                       OVER (PARTITION BY ItemId
                                 ORDER BY TrackDateTime),
       NextLocation  = LEAD(LocationId)
                       OVER (PARTITION BY ItemId
                                 ORDER BY TrackDateTime)
  FROM @Locations
)
SELECT *
  FROM CTE
 WHERE LocationId = PriorLocation;

Here the LAG function was used to get the previous value; the LEAD function was used to get the next value, and then return just the rows where the current location and prior location are the same. The results are:

ItemId      LocationId  TrackDateTime           PriorLocation NextLocation
----------- ----------- ----------------------- ------------- ------------
1           10          2004-10-27 00:00:00.000 10            5
1           3           2008-04-07 00:00:00.000 3             7
1           6           2009-01-17 00:00:00.000 6             2
2           2           2004-01-02 00:00:00.000 2             4
2           1           2007-06-12 00:00:00.000 1             6
2           3           2008-10-12 00:00:00.000 3             5
3           8           2002-12-16 00:00:00.000 8             9
3           6           2005-08-21 00:00:00.000 6             10
3           2           2007-07-16 00:00:00.000 2             3
4           1           2003-07-03 00:00:00.000 1             6
4           7           2004-11-09 00:00:00.000 7             3
4           2           2008-08-26 00:00:00.000 2             4
5           8           2002-09-30 00:00:00.000 8             5
5           4           2004-04-03 00:00:00.000 4             10
5           6           2008-05-19 00:00:00.000 6             2
6           5           2003-07-28 00:00:00.000 5             8
6           4           2006-01-11 00:00:00.000 4             9
6           1           2008-11-20 00:00:00.000 1             2
7           6           2002-02-08 00:00:00.000 6             3
7           4           2003-03-18 00:00:00.000 4             2
7           1           2006-08-05 00:00:00.000 1             10
8           4           2002-10-03 00:00:00.000 4             5
8           3           2005-08-04 00:00:00.000 3             8
8           1           2006-07-05 00:00:00.000 1             10
9           4           2001-11-17 00:00:00.000 4             6
9           3           2005-04-18 00:00:00.000 3             10
9           5           2006-08-02 00:00:00.000 5             2
10          6           2002-03-15 00:00:00.000 6             2
10          8           2004-01-30 00:00:00.000 8             1
10          4           2008-06-01 00:00:00.000 4             5

I think that these two functions offer some incredible possibilities, by taking a peek at the rows adjacent to the current row. Hopefully this post gives you some ideas of how you can use these functions.

References:

LAG

LEAD

Analytic Function Series:

Overview

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating