Need help with writing TSQL

  • Hi,

    I have a table with two columns i.e. ChangeDate and StepId. I need a third column 'status' at run time based on the data in this table.

    Records can be grouped based on the stepId. But there can be multiple groups for same stepId. For any group, if the last ChangeDate value is null, then all the rows from that group will have status 'current' . If last ChangeDate value is not null, then all the rows from that group will have status 'completed'

    Please check attached screenshot for more details. Can somebody please tell what could be the optimized and efficient TSQL to get the output ?

    Thanks,

    Bhimraj

  • Hi Bhimraj,

    Do you have any other fields in the table to do the grouping on? If not, it makes things trickier. If you could post a SQL statement to create your data table with some sample data in the form of insert statements that would be a big help.

    Cheers, Iain

  • Can there only ever be be one row with a NULL ChangeDate, with this row belonging to the most recent group?

    If this is not the case, then how can you determine which group a row with a NULL date belongs to, since you state that the StepId does not uniquely identify a group?

    Or maybe, if there are multiple rows with a NULL ChangeDate, should each "NULL" row be assigned to the last group with the same StepId? - please clarify.

  • Hi,

    Find the below way i have tried

    declare @Temp table (Changedate datetime, stepid int)

    declare @Temp1 table (Changedate datetime, stepid int,Types bit)

    insert into @Temp

    Select '2007-06-03 20:39:00',1

    union all Select '2007-06-26 12:47:00',1

    union all Select '2007-07-09 15:17:00',2

    union all Select '2007-08-14 14:42:00',2

    union all Select '2007-08-20 10:42:00',2

    union all Select '2007-10-10 15:36:00',3

    union all Select '2007-12-14 09:53:00',3

    union all Select '2008-01-28 14:45:00',2

    union all Select '2008-07-02 15:32:00',2

    union all Select '2008-12-16 13:38:00',2

    union all Select '2009-02-27 14:04:00',2

    union all Select NULL,2 /*I have included one more Null to test*/

    union all Select '2009-03-06 15:14:00',1

    union all Select '2009-03-26 09:41:00',1

    union all Select '2009-12-11 20:27:00',1

    union all Select '2010-04-14 14:44:00',1

    union all Select NULL,1

    -- Considering @temp as your table

    /*Here you are adding one more column with default as 0 */

    insert into @Temp1

    Select *,0 from @Temp

    Select * from @Temp1

    /*Here you are updating the default value to 1 where it is NULL */

    Update @Temp1 set Types=1 where Changedate is null

    /*Final Statement*/

    Select Changedate,stepid,case when Types=1 then 'Current' else 'Copmleted' end as types from @Temp1

    Thanks

    Parthiban .L

    Thanks
    Parthi

  • Edit: supplies solution to different problem *facepalm*

  • declare @t table(id int identity,dt date NULL,stepid int)

    insert @t

    select '2008-03-03',1 union all

    select '2008-03-04',1 union all

    select '2008-03-14',5 union all

    select null,5 union all

    select '2008-03-20',6 union all

    select '2008-03-21',6 union all

    select '2008-04-04',20 union all

    select '2008-04-08',20 union all

    select null,20 union all

    select '2008-04-15',1 union all

    select '2008-04-18',1 union all

    select '2008-06-20',1 union all

    select '2008-06-22',1 union all

    select '2008-06-25',1 union all

    select null,1

    ;with cte

    as

    (

    select id,stepid,(dense_rank()over(order by stepid,id)-id)+stepid rid from @t

    )

    ,cte1

    as

    (

    select c.stepid,t.dt,c.id,c.rid from cte c

    inner join @t t on t.id=c.id

    where t.dt is null

    )

    select t.id,t.dt,t.stepid,

    (case when c.rid IS null then 'complete'else 'current'end) status

    from cte1 c1

    inner join cte c on c.rid=c1.rid

    right join @t t on t.id=c.id

    order by c1.id,c1.stepid

    --------------------------------------------------------------------------------------------------
    I am just an another naive wannabe DBA trying to learn SQL Server

  • Parthiban, I'm not sure your solution does what is required. As far as I can tell, the OP wants *all* records in a group to show as 'current' where the last record in the group has a null date.

    The following should do what you need, but comes with a warning that it creates triangular joins, which are a hidden form of RBAR (search for Jeff Moden's article on the subject on this site to understand this if you don't already).

    -- create some test data

    create table #tmp (

    Date_Field datetime

    , Int_Field int

    )

    -- insert some values

    -- assumption1: the null row is always the last in a set

    -- assumption2: dates are sequential, i.e. all dates in the second group

    -- will be after the latest date in the first group, etc

    insert #tmp values ('2010-11-09',1)

    insert #tmp values ('2010-11-14',1)

    insert #tmp values ('2010-11-19',1)

    insert #tmp values ('2010-11-24',1)

    insert #tmp values (NULL,1)

    insert #tmp values ('2010-12-04',2)

    insert #tmp values ('2010-12-09',2)

    insert #tmp values ('2010-12-14',2)

    insert #tmp values (NULL,2)

    insert #tmp values ('2010-12-24',3)

    insert #tmp values ('2010-12-29',3)

    insert #tmp values ('2011-01-03',3)

    insert #tmp values ('2011-01-08',3)

    insert #tmp values ('2011-01-13',1)

    insert #tmp values ('2011-01-18',1)

    insert #tmp values ('2011-01-23',1)

    insert #tmp values ('2011-01-28',1)

    insert #tmp values (NULL,2)

    -- add a sequential column and index

    alter table #tmp add Order_Key int identity

    create clustered index ix_tmp on #tmp(Order_Key)

    -- offset join the table to itself to identify

    -- where the group for each row starts and finishes

    -- warning: these create triangular joins

    ; with cte as (

    select a.Order_Key

    , a.Date_Field

    , a.Int_Field

    , (select isnull(max(Order_Key)+1,1) from #tmp d

    where d.Order_Key <= a.Order_Key

    and d.Int_Field <> a.Int_Field

    ) as Bottom_Range

    , (select isnull(min(Order_Key)-1,(select max(Order_Key) from #tmp)) from #tmp d

    where d.Order_Key >= a.Order_Key

    and d.Int_Field <> a.Int_Field

    ) as Top_Range

    from #tmp a

    left join #tmp b

    on a.Order_Key = b.Order_Key + 1

    )

    -- pull the final extract together

    select *

    , case when exists (select 'Null Value Found' from cte e

    where e.Order_Key between cte.Bottom_Range and cte.Top_Range

    and e.Date_Field is null

    )

    then 'Current'

    else 'Completed'

    end as [Type]

    from cte

    drop table #tmp

    I couldn't come up with a way to avoids this, perhaps someone else might be able to?

    Regards, Iain

  • Nice solution Sachin. I've run some comparative testing, which throws up some interesting results. Using the following test script, your solution runs quicker and has much lower read counts. But, when you generate a comparative execution plan, SQL estimates the rows returned by yours really badly, making it look like my solution is better (by 0% to 100%), even though it isn't. I wonder why this is?

    Also, there is an issue with your solution producing too many rows, not sure why this might be? The row count in the test table is 73728, your solution returns 75828.

    Test script:

    /**** uses

    dbcc freeproccache

    dbcc dropcleanbuffers

    do not run on production server

    *****/

    -- create some test data

    create table #tmp (

    Date_Field datetime

    , Int_Field int

    )

    -- insert some values

    -- assumption: the null row is always the last in a set

    insert #tmp values ('2010-11-09',1)

    insert #tmp values ('2010-11-14',1)

    insert #tmp values ('2010-11-19',1)

    insert #tmp values ('2010-11-24',1)

    insert #tmp values (NULL,1)

    insert #tmp values ('2010-12-04',2)

    insert #tmp values ('2010-12-09',2)

    insert #tmp values ('2010-12-14',2)

    insert #tmp values (NULL,2)

    insert #tmp values ('2010-12-24',3)

    insert #tmp values ('2010-12-29',3)

    insert #tmp values ('2011-01-03',3)

    insert #tmp values ('2011-01-08',3)

    insert #tmp values ('2011-01-13',1)

    insert #tmp values ('2011-01-18',1)

    insert #tmp values ('2011-01-23',1)

    insert #tmp values ('2011-01-28',1)

    insert #tmp values (NULL,2)

    -- bump up the data counts

    declare @i int = 1

    while @i <=12

    begin

    insert #tmp

    select dateadd(dd,90*@i,Date_Field)

    , Int_Field

    from #tmp

    set @i = @i+1

    end

    select count(*) from #tmp

    -- 73728

    -- add a sequential column and index

    alter table #tmp add Order_Key int identity

    create clustered index ix_tmp on #tmp(Order_Key)

    dbcc freeproccache

    dbcc dropcleanbuffers

    set statistics io on

    set statistics time on

    -- offset join the table to itself to identify

    -- where the group for each row starts and finishes

    -- warning: these create triangular joins

    ; with cte as (

    select a.Order_Key

    , a.Date_Field

    , a.Int_Field

    , (select isnull(max(Order_Key)+1,1) from #tmp d

    where d.Order_Key <= a.Order_Key

    and d.Int_Field <> a.Int_Field

    ) as Bottom_Range

    , (select isnull(min(Order_Key)-1,(select max(Order_Key) from #tmp)) from #tmp d

    where d.Order_Key >= a.Order_Key

    and d.Int_Field <> a.Int_Field

    ) as Top_Range

    from #tmp a

    left join #tmp b

    on a.Order_Key = b.Order_Key + 1

    )

    -- pull the final extract together

    select *

    , case when exists (select 'Null Value Found' from cte e

    where e.Order_Key between cte.Bottom_Range and cte.Top_Range

    and e.Date_Field is null

    )

    then 'Current'

    else 'Completed'

    end as [Type]

    from cte

    go

    dbcc freeproccache

    dbcc dropcleanbuffers

    ;with cte

    as

    (

    select Order_Key,Int_Field,(dense_rank()over(order by Int_Field,Order_Key)-Order_Key)+Int_Field rid from #tmp

    )

    ,cte1

    as

    (

    select c.Int_Field,t.Date_Field,c.Order_Key,c.rid from cte c

    inner join #tmp t on t.Order_Key=c.Order_Key

    where t.Date_Field is null

    )

    select t.Order_Key,t.Date_Field,t.Int_Field,

    (case when c.rid IS null then 'complete'else 'current'end) status

    from cte1 c1

    inner join cte c on c.rid=c1.rid

    right join #tmp t on t.Order_Key=c.Order_Key

    order by t.Order_Key,c1.Int_Field

    -- 75828 rows

    drop table #tmp

  • Sachin's solution does rely on the added IDENTITY column to define the order of all the rows in the table, including those rows with a NULL ChangeDate. If there is some other column (that the OP hasn't yet mentioned) that could be used to define this ordering then Sachin's approach could made to work, but without such a column and without further clarification from the OP we are forced to make arbitrary assumptions.

    If there is a single row with a NULL ChangeDate that has the same StepId as the row with the maximum non-null ChangeDate value, then the situation is unambiguous.

    ChangeDate StepId Status

    ----------- ------ --------

    2010-05-01 1 Complete

    2010-05-03 1 Complete

    2010-05-08 1 Complete

    2010-06-02 2 Current

    2010-06-07 2 Current

    NULL 2 Current

    Now suppose we add another row with StepId = 1 and a NULL ChangeDate.

    ChangeDate StepId

    ----------- --------

    NULL 1

    Since the ChangeDate column is NULL, there is no unambiguous ordering for this row. Is it part of the original group with StepId = 1 like this...?

    ChangeDate StepId Status

    ----------- ------ --------

    2010-05-01 1 Current

    2010-05-03 1 Current

    2010-05-08 1 Current

    NULL 1 Current

    2010-06-02 2 Current

    2010-06-07 2 Current

    NULL 2 Current

    Or is it the first row of a new group, like this...?

    ChangeDate StepId Status

    ----------- ------ --------

    2010-05-01 1 Complete

    2010-05-03 1 Complete

    2010-05-08 1 Complete

    2010-06-02 2 Current

    2010-06-07 2 Current

    NULL 2 Current

    NULL 1 Current

  • Sorry for delay in reply and THANK YOU all for the solutions you provided here. Your help is very much appreciated.

    Solutions provided by parthi-1705, sachin-355820, irobertson - is what I was looking for. I have some other columns in this table such as identity column and few other. But none of these column can be used for grouping.

    @andrewd.smith : There are two columns that can be used for ordering here - identity column and NextChangeDate. Sorry, I didnt mention this earlier. I didnt thought solution will require this information. But the solution provided here works for me.

    I am going to check performance with actual data. Thank you again !! This is why I like this forum - people are ready with differnt solutions much before you clarify the problem.

  • rajg (11/9/2010)


    There are two columns that can be used for ordering here - identity column and NextChangeDate. Sorry, I didnt mention this earlier. I didnt thought solution will require this information.

    This is one of the reasons it's best to post DDL and sample data that looks as much like your problem as possible. Often, it's the least obvious thing that is key to solving the problem 😉

    Glad we could be of help, I'd be interested in how the solutions work out in your environment. Please let us know...

    Regards, Iain

    Edit: typo...

  • rajg (11/9/2010)


    Hi,

    I have a table with two columns i.e. ChangeDate and StepId. I need a third column 'status' at run time based on the data in this table.

    Records can be grouped based on the stepId. But there can be multiple groups for same stepId. For any group, if the last ChangeDate value is null, then all the rows from that group will have status 'current' . If last ChangeDate value is not null, then all the rows from that group will have status 'completed'

    Please check attached screenshot for more details. Can somebody please tell what could be the optimized and efficient TSQL to get the output ?

    Thanks,

    Bhimraj

    Do you have anything besides the StepID to put this data into "groups"? As it is, the StepID can repeat, so the definition of a group is that the StepID changes from it's previous value. If there are some other fields that will provide this grouping, I'm sure that there is an extremely efficient, fast method to solve this issue.

    Did you see how Iain provided sample data in a readily consumable format? If you would do this, it would make it A LOT easier on all of the volunteers on this site. Please read the first link in my signature for how to do this. You should have (at the minimum), the CREATE TABLE statement(s) for the applicable table(s), and INSERT statement(s) for each table to put in some sample data, and expected results based upon the sample data provided. Indexes and constraints frequently prove to be very useful also!

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • irobertson (11/9/2010)


    rajg (11/9/2010)


    There are two columns that can be used for ordering here - identity column and NextChangeDate. Sorry, I didnt mention this earlier. I didnt thought solution will require this information.

    This is one of the reasons it's best to post DDL and sample data that looks as much like your problem as possible. Often, it's the least obvious thing that is key to solving the problem 😉

    Glad we could be of help, I'd be interested in how the solutions work out in your environment. Please let us know...

    Regards, Iain

    Edit: typo...

    I'd be interetested in knowing, as well. One "out of place" date (ie: parallel steps) and BOOM!... all the solutions shatter like glass because they rely only on a sequential date or the supposed implicit order of data. Not good. Now, if that's all there is, then all is well. But, like has been asked many times, I'd really be interested if there are any other columns of data associated with this problem so we can make the solutions a bit more shatter-proof.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • rajg (11/9/2010)


    Hi,

    I have a table with two columns i.e. ChangeDate and StepId. I need a third column 'status' at run time based on the data in this table.

    Records can be grouped based on the stepId. But there can be multiple groups for same stepId. For any group, if the last ChangeDate value is null, then all the rows from that group will have status 'current' . If last ChangeDate value is not null, then all the rows from that group will have status 'completed'

    Please check attached screenshot for more details. Can somebody please tell what could be the optimized and efficient TSQL to get the output ?

    Thanks,

    Bhimraj

    Pictures are pretty but they don't do a whole lot for helping people help you. Please take a look at the article located in my signature line below. THAT's the way to post data to get the best help.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I take that back. Sachin's looks like it'll work just fine (classic double rownumber) but is guaranteed only if the original insert were to have an ORDER BY in this case.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply