Find Latest Date from Multiple Columns

  • I have ten user defined date fields, but for simplicity let's focus on three. These are only dates, not datetime. They're labeled L1, L2, and L3 and contain a date. They are manual entry so the date can be manually changed by the user. I need to find the latest date and the column header it belongs to.

    CREATE TABLE datetest (id varchar(8), L1 date, L2 date, L3 date)

    INSERT INTO datetest (id, L1, L2, L3)

    VALUES ('1005', '1-1-16', '1-17-16', '1-10-16')

    What I want to see is three columns: id, MAX date, Column it comes from. In the case above, I'd hope for:

    1005, 1-17-16, L2

  • You can use a CASE statement. It'll be messy, but SQL doesn't have a 'max across columns' function.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I assume you mean like this? I appreciate the guidance.

    SELECT id

    , CASE

    WHEN L1 > L2 AND L1 > L3 THEN L1

    WHEN L2 > L1 AND L2 > L3 THEN L2

    ELSE L3

    END AS 'Date'

    , CASE

    WHEN L1 > L2 AND L1 > L3 THEN 'L1'

    WHEN L2 > L1 AND L2 > L3 THEN 'L2'

    ELSE 'L3'

    END AS 'L'

    FROM datetest

  • You can also use APPLY

    SELECT MA.MaxDate

    FROM <mytable> AS MT

    CROSS APPLY (

    SELECT MAX(VA.LDate)

    FROM (VALUES(MT.L1),(MT.L2),(MT.L3)....) VA(LDate)

    ) AS MA(MaxDate)

    Excuse any slight syntax errors, I have typed this away from SQL.

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • Just noticed "and the column header it belongs to", so:

    SELECT MA.MaxDate, MA.ColName

    FROM <mytable> AS MT

    CROSS APPLY (

    SELECT TOP(1) VA.LName, VA.LDate

    FROM (VALUES('L1',MT.L1),('L2',MT.L2),('L3',MT.L3)....) VA(LName, LDate)

    ORDER BY VA.LDate DESC

    ) AS MA(ColName, MaxDate)

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • @mister.magoo, never seen APPLY used with a row-ctor to unpivot like that. Very cool.

    Just for the sake of providing an alternative, we can do the same using a traditional UNPIVOT:

    SELECT TOP 1

    dates.id,

    dates.L,

    dates.Val

    FROM (

    SELECT id,

    L1,

    L2,

    L3

    FROM dbo.datetest

    ) dt UNPIVOT ( Val FOR L IN (L1, L2, L3) ) AS dates

    ORDER BY dates.Val DESC;

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Just for the fun of it... Here's a solution that uses a cast to binary & contaminate method...

    Test data...

    IF OBJECT_ID('tempdb..#temp', 'U') IS NOT NULL

    DROP TABLE #temp;

    CREATE TABLE #temp (

    id INT PRIMARY KEY,

    c1 DATE,

    c2 DATE,

    c3 DATE,

    c4 DATE,

    c5 DATE

    );

    INSERT #temp (id,c1,c2,c3,c4,c5)VALUES

    (1, '2014-01-05','2014-06-03','2015-12-31','2015-06-01','2015-01-08'),

    (2, '2015-01-05','2014-07-08','2014-12-31','2016-06-07','2015-02-02'),

    (3, '2016-01-15','2014-04-03','2014-12-21','2015-07-01','2014-01-05'),

    (4, '2014-01-05','2014-06-13','2045-12-30','2014-06-11','2015-11-28'),

    (5, '2015-11-05','2015-06-03','2014-12-22','2014-11-01','2015-02-18');

    And the solution...

    SELECT

    t.id,

    ColumnName = CAST(SUBSTRING(MAX(x.DateValue), 5, 4) AS VARCHAR(2)),

    DateValue = CAST(SUBSTRING(MAX(x.DateValue), 1, 4) AS DATE)

    FROM

    #temp t

    CROSS APPLY (VALUES

    (CAST(t.c1 AS BINARY(4)) + CAST('c1' AS BINARY(4))),

    (CAST(t.c2 AS BINARY(4)) + CAST('c2' AS BINARY(4))),

    (CAST(t.c3 AS BINARY(4)) + CAST('c3' AS BINARY(4))),

    (CAST(t.c4 AS BINARY(4)) + CAST('c4' AS BINARY(4))),

    (CAST(t.c5 AS BINARY(4)) + CAST('c5' AS BINARY(4))))

    x (DateValue)

    GROUP BY

    t.id;

  • Orlando Colamatteo (1/18/2016)


    @mister.magoo, never seen APPLY used with a row-ctor to unpivot like that. Very cool.

    Just for the sake of providing an alternative, we can do the same using a traditional UNPIVOT:

    SELECT TOP 1

    dates.id,

    dates.L,

    dates.Val

    FROM (

    SELECT id,

    L1,

    L2,

    L3

    FROM dbo.datetest

    ) dt UNPIVOT ( Val FOR L IN (L1, L2, L3) ) AS dates

    ORDER BY dates.Val DESC;

    This doesn't work as the top 1 limits the output to a single row regardless of how many rows are in the table.

    😎

  • Jason A. Long (1/18/2016)


    Just for the fun of it... Here's a solution that uses a cast to binary & contaminate method...

    Test data...

    IF OBJECT_ID('tempdb..#temp', 'U') IS NOT NULL

    DROP TABLE #temp;

    CREATE TABLE #temp (

    id INT PRIMARY KEY,

    c1 DATE,

    c2 DATE,

    c3 DATE,

    c4 DATE,

    c5 DATE

    );

    INSERT #temp (id,c1,c2,c3,c4,c5)VALUES

    (1, '2014-01-05','2014-06-03','2015-12-31','2015-06-01','2015-01-08'),

    (2, '2015-01-05','2014-07-08','2014-12-31','2016-06-07','2015-02-02'),

    (3, '2016-01-15','2014-04-03','2014-12-21','2015-07-01','2014-01-05'),

    (4, '2014-01-05','2014-06-13','2045-12-30','2014-06-11','2015-11-28'),

    (5, '2015-11-05','2015-06-03','2014-12-22','2014-11-01','2015-02-18');

    And the solution...

    SELECT

    t.id,

    ColumnName = CAST(SUBSTRING(MAX(x.DateValue), 5, 4) AS VARCHAR(2)),

    DateValue = CAST(SUBSTRING(MAX(x.DateValue), 1, 4) AS DATE)

    FROM

    #temp t

    CROSS APPLY (VALUES

    (CAST(t.c1 AS BINARY(4)) + CAST('c1' AS BINARY(4))),

    (CAST(t.c2 AS BINARY(4)) + CAST('c2' AS BINARY(4))),

    (CAST(t.c3 AS BINARY(4)) + CAST('c3' AS BINARY(4))),

    (CAST(t.c4 AS BINARY(4)) + CAST('c4' AS BINARY(4))),

    (CAST(t.c5 AS BINARY(4)) + CAST('c5' AS BINARY(4))))

    x (DateValue)

    GROUP BY

    t.id;

    Nice idea but I see two potential problems, firstly the costly concatenation and substringing with the data type conversion is going to impact the performance probably more than what is gained by the potential stream aggregation in the execution plan. Secondly because of concatenation it is very unlikely to get a parallel plan which probably makes it perform similar to a normal union all for all columns. There are two flavours of unpivoting using cross apply, values and union all, the difference is that the former most likely goes parallel and the latter never goes parallel.

    😎

    Edit: Correction

  • Eirikur Eiriksson (1/18/2016)


    Orlando Colamatteo (1/18/2016)


    @mister.magoo, never seen APPLY used with a row-ctor to unpivot like that. Very cool.

    Just for the sake of providing an alternative, we can do the same using a traditional UNPIVOT:

    SELECT TOP 1

    dates.id,

    dates.L,

    dates.Val

    FROM (

    SELECT id,

    L1,

    L2,

    L3

    FROM dbo.datetest

    ) dt UNPIVOT ( Val FOR L IN (L1, L2, L3) ) AS dates

    ORDER BY dates.Val DESC;

    This doesn't work as the top 1 limits the output to a single row regardless of how many rows are in the table.

    😎

    My query works for a single row, i.e. on the OP's test data. It wants a WHERE-clause to limit what goes in to a single row.

    If you needed something to work on a set you could partition and order the rows:

    SELECT up.id,

    up.L,

    up.Val

    FROM (

    SELECT dates.id,

    dates.L,

    dates.Val,

    ROW_NUMBER() OVER (PARTITION BY dates.id ORDER BY dates.Val DESC) AS uprank

    FROM (

    SELECT id,

    L1,

    L2,

    L3

    FROM dbo.datetest

    ) dt UNPIVOT ( Val FOR L IN (L1, L2, L3) ) AS dates

    ) up

    WHERE up.uprank = 1;

    I tried both of these on 1MM rows and as I suspected the CROSS APPLY method was quite a bit faster and more efficient on CPU time too.

    CROSS APPLY:

    Table 'datetest'. Scan count 5, logical reads 3448, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 1891 ms, elapsed time = 1343 ms.

    (1000000 row(s) affected)

    UNPIVOT+ROW_NUMBER:

    Table 'datetest'. Scan count 5, logical reads 3448, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 10154 ms, elapsed time = 2970 ms.

    (1000000 row(s) affected)

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Eirikur Eiriksson (1/18/2016)


    Nice idea but I see two potential problems, firstly the costly concatenation and substringing with the data type conversion is going to impact the performance probably more than what is gained by the potential stream aggregation in the execution plan. Secondly because of concatenation it is very unlikely to get a parallel plan which probably makes it perform similar to a normal union all for all columns. There are two flavours of unpivoting using cross apply, values and union all, the difference is that the former most likely goes parallel and the latter never goes parallel.

    😎

    Edit: Correction

    I can't comment on the above without testing on a larger data set. That said, the "top 1 / order by" method has a sort operator on an expression and the "binary concatenation" method puts the sort on the ID column... Which, in turn, means the sort operator can be eliminated by putting a clustered index on the ID column.

    My thought (not proven w/o testing) is that the elimination sort operation should give the "binary concatenation" method the edge.

  • Jason A. Long (1/19/2016)


    Eirikur Eiriksson (1/18/2016)


    Nice idea but I see two potential problems, firstly the costly concatenation and substringing with the data type conversion is going to impact the performance probably more than what is gained by the potential stream aggregation in the execution plan. Secondly because of concatenation it is very unlikely to get a parallel plan which probably makes it perform similar to a normal union all for all columns. There are two flavours of unpivoting using cross apply, values and union all, the difference is that the former most likely goes parallel and the latter never goes parallel.

    😎

    Edit: Correction

    I can't comment on the above without testing on a larger data set. That said, the "top 1 / order by" method has a sort operator on an expression and the "binary concatenation" method puts the sort on the ID column... Which, in turn, means the sort operator can be eliminated by putting a clustered index on the ID column.

    My thought (not proven w/o testing) is that the elimination sort operation should give the "binary concatenation" method the edge.

    The binary method run my 1MM row test table without a CI is a bit slower than the CROSS APPLY:

    Table 'datetest'. Scan count 5, logical reads 3448, physical reads 0, read-ahead reads 2850, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 5328 ms, elapsed time = 1832 ms.

    (1000000 row(s) affected)

    With a CI, the binary method still ends up a bit slower on elapsed time than the CROSS APPLY with or without the same CI but does better on CPU time:

    Table 'datetest'. Scan count 1, logical reads 3464, physical reads 0, read-ahead reads 17, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 1515 ms, elapsed time = 1532 ms.

    The CROSS APPLY results again:

    Table 'datetest'. Scan count 5, logical reads 3448, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 1891 ms, elapsed time = 1343 ms.

    My lowly entry using UNPIVOT+ROW_NUMBER still lags behind the others but it benefited the most from the CI:

    Table 'Worktable'. Scan count 1, logical reads 2000005, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'datetest'. Scan count 1, logical reads 3464, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 2562 ms, elapsed time = 2565 ms.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Thanks for testing Orlando.

    I wonder how the performance of each method will depend on how many "date columns" are present as well, whether having 20 columns vs 10 columns is an issue?

    Not suggesting anyone bother to test that 🙂

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • The timings I presented were for 3 column 1MM rows. I wondered the same and did 10 to see how it did and the placing came out the same:

    cross apply:

    Table 'datetest'. Scan count 5, logical reads 6271, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 3842 ms, elapsed time = 2719 ms.

    binary:

    Table 'datetest'. Scan count 5, logical reads 6255, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 5296 ms, elapsed time = 3524 ms.

    unpivot:

    Table 'Worktable'. Scan count 1, logical reads 2000019, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'datetest'. Scan count 1, logical reads 6197, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 5500 ms, elapsed time = 5515 ms.

    Maybe tonight I will work up a CASE-expression for this just to see.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Just a cautionary note on the CASE version: be aware of how your implementation handles ties. If you're not careful with the CASE, then you can get incorrect results, as in this example.

    SELECT ID=1,

    L1='20151201',

    L2='20151201',

    L3='19000101'

    INTO #temp;

    SELECT id

    , CASE

    WHEN L1 > L2 AND L1 > L3 THEN L1

    WHEN L2 > L1 AND L2 > L3 THEN L2

    ELSE L3

    END AS 'Date'

    , CASE

    WHEN L1 > L2 AND L1 > L3 THEN 'L1'

    WHEN L2 > L1 AND L2 > L3 THEN 'L2'

    ELSE 'L3'

    END AS 'L'

    FROM #temp;

    DROP TABLE #temp;

    A quick fix would be to change > to >= in the checks. Just be aware that if multiple columns are tied, then you'll always get the first of the ties in the order you evaluate them. That might not matter, or you might want to make sure you return a particular one of the ties, in which case you would need additional modifications.

    The CASE statement's branches get evaluated in order, so you can also simplify a bit. If the second branch is being evaluated (after switching to >=), you know L1 isn't even a tie for the highest value, so you don't need to compare L2 to it. If the branch with L3's comparisons is being evaluated, you know that neither L1 nor L2 were even ties for the highest date, so L3 doesn't have to be compared to them, and so on down the branches (so your CASE statements end up being triangular looking affairs).

    Here's what it would look like for Jason's sample data:

    SELECT id,

    CASE WHEN c1>=c2 AND c1>=c3 AND c1>=c4 and c1>=c5 THEN c1

    WHEN c2>=c3 AND c2>=c4 AND c2>=c5 THEN c2

    WHEN c3>=c4 AND c3>=c5 THEN c3

    WHEN c4>=c5 THEN c4

    ELSE c5

    END,

    CASE WHEN c1>=c2 AND c1>=c3 AND c1>=c4 and c1>=c5 THEN 'c1'

    WHEN c2>=c3 AND c2>=c4 AND c2>=c5 THEN 'c2'

    WHEN c3>=c4 AND c3>=c5 THEN 'c3'

    WHEN c4>=c5 THEN 'c4'

    ELSE 'c5'

    END

    FROM #temp

    Cheers!

  • Viewing 15 posts - 1 through 15 (of 43 total)

    You must be logged in to reply to this topic. Login to reply