Cursor

  • rbarryyoung (8/28/2008)


    Jeff Moden (8/28/2008)


    It's also part of the reason I wrote that forum ettiquette article on how to provide actual test data so that good folks like yourself don't have to spend so much time creating data and can get right to trying to solve the problem, instead.

    Man, I think that I linked to that article like 20 times today. Now I know why you and Jack put it into your signature!

    Heh... yeah... I got tired of looking it up. Sure wish Steve could find a way to make it a "sticky" in some of the forums.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • GSquared (8/28/2008)


    The first query breaks them up into sets, based on the SetStart and SetEnd columns and the Date column.

    The second query then takes each set and gives it a sequence number within the set.

    Is that what you're looking for? Is it clear enough?

    Unfortunately this type of problem is not so clear to most users even in its recognition let alone a solution. Sql server doesn't help matters by offering no construct(s) to directly solve the problem. And what is the clearest, simplest way to characterize a problem like this so it becomes easy to communicate about it? This problem, along with relational division, appears to be sqls version of porn. You can't nail down the definition but you know it when you see it:)

    These are two pretty good threads about approaching this type of problem. They do contain spoons but forks as well. FWIW I'm dassin, steve.

    'searching for the longest subsequence of ones'

    http://tinyurl.com/5g2dgq

    'Roll Up IP-Location Database'

    http://tinyurl.com/6evlbr

    www.beyondsql.blogspot.com

  • I finally had time to test this, and I found something I hadn't noticed before. Your test data (first post) has duplicate IDs and all the dates and times are the same. Is that an accident, or does your real data also have duplicate ID values? Also, the test table has no primary key. Is that true of the real table or just an oversight on the test?

    In the hope that your data doesn't actually have those problems, I created the following test and code:

    CREATE TABLE #mytable

    ( ID INT IDENTITY(1,1) ,

    DateValue DATETIME, Prd nvarchar(250), Statusid INT )

    INSERT INTO #mytable

    (DateValue,Prd, Statusid)

    SELECT 'Oct 17 2007 12:00AM','A','1' UNION ALL

    SELECT 'Oct 17 2007 12:00AM','A','1' UNION ALL

    SELECT 'Oct 17 2007 12:00AM','A','1' UNION ALL

    SELECT 'Oct 17 2007 12:00AM','A','1' UNION ALL

    SELECT 'Oct 17 2007 12:00AM','B','1' UNION ALL

    SELECT 'Oct 17 2007 12:00AM','B','1' UNION ALL

    SELECT 'Oct 17 2007 12:00AM','B','1' UNION ALL

    SELECT 'Oct 17 2007 12:00AM','B','2' UNION ALL

    SELECT 'Oct 17 2007 12:00AM','B','1' UNION ALL

    SELECT 'Oct 17 2007 12:00AM','B','2' UNION ALL

    SELECT 'Oct 17 2007 12:00AM','A','1' UNION ALL

    SELECT 'Oct 17 2007 12:00AM','A','1' UNION ALL

    SELECT 'Oct 17 2007 12:00AM','A','1' UNION ALL

    SELECT 'Oct 17 2007 12:00AM','A','1' UNION ALL

    SELECT 'Oct 17 2007 12:00AM','A','2'

    update #MyTable

    set DateValue = dateadd(minute, ID, DateValue)

    alter table #MyTable

    add SetFirst bit not null default(0), SetLast bit not null default(0)

    ;with CTE (Row, Date, Prd, StatusID, SetFirst, SetLast) as

    (select row_number() over (partition by Prd order by DateValue),

    DateValue, Prd, StatusID, SetFirst, SetLast

    from #MyTable)

    update C1

    set

    SetFirst =

    case

    when C2.row is not null then 1

    when not exists

    (select *

    from #MyTable mt

    where mt.prd = c1.prd

    and mt.datevalue < c1.date) then 1

    else 0

    end,

    SetLast =

    case

    when C3.row is not null then 1

    when not exists

    (select *

    from #MyTable mt

    where mt.prd = c1.prd

    and mt.datevalue > c1.date) then 1

    else 0

    end

    from CTE C1

    left outer join CTE C2 -- The prior row is a 2

    on C1.Prd = C2.Prd

    and C1.Row = C2.Row+1

    and C1.StatusID = 1

    and C2.StatusID = 2

    left outer join CTE C3 -- Status 2 and next row is a 1

    on C1.Prd = C3.Prd

    and C1.Row = C3.Row-1

    and C1.StatusID = 2

    and C3.StatusID = 1

    select *

    from #MyTable

    Because of the Not Exists tests, it has at least that much row-by-row. There's almost certainly a better way to do that particular test, but I don't have time to mess around with that right now. This way works, at least on this data.

    Another thing I noticed about your test data is that you had two StatusID 2 in a row for Prd "B". That seems to indicate a close without an open. I got rid of that for my test, but if it really exists in your database, the code will have to correct for that.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Hi

    GSquared

    After runnig your both the query, for set and sequece, this is what i got

    ID DateValue Prd statusid setfirst setlast setid setsequence

    12007-10-17 00:01:00.000A11011

    22007-10-17 00:02:00.000A10021

    32007-10-17 00:03:00.000A10001

    42007-10-17 00:04:00.000A10002

    52007-10-17 00:05:00.000B11003

    62007-10-17 00:06:00.000B10004

    72007-10-17 00:07:00.000B10005

    82007-10-17 00:08:00.000B20106

    92007-10-17 00:09:00.000B11007

    102007-10-17 00:10:00.000B20108

    112007-10-17 00:11:00.000A10009

    122007-10-17 00:12:00.000A100010

    132007-10-17 00:13:00.000A100011

    142007-10-17 00:14:00.000A100012

    152007-10-17 00:15:00.000A201013

    but i was looking for something like this! and setfirst was good escept for the last Prd A

    ID DateValue Prd statusid setfirst setlast setid setsequence

    12007-10-17 00:01:00.000A11011

    22007-10-17 00:02:00.000A10021

    32007-10-17 00:03:00.000A10001

    42007-10-17 00:04:00.000A10001

    52007-10-17 00:05:00.000B11001

    62007-10-17 00:06:00.000B10001

    72007-10-17 00:07:00.000B10001

    82007-10-17 00:08:00.000B20101

    92007-10-17 00:09:00.000B11002

    102007-10-17 00:10:00.000B20102

    112007-10-17 00:11:00.000A11003

    122007-10-17 00:12:00.000A10003

    132007-10-17 00:13:00.000A10003

    142007-10-17 00:14:00.000A10003

    152007-10-17 00:15:00.000A20103

    So sequence 1 is one block , Sequence two is another block and Sequence 3 is one more block

    and after acheiving this i can trap the first and last record of each sequence which is my ultimate goal

    I guess i am not messing up your labor day weekend!

    I am really excited and thrilled from your unconditional help for new guy in SQL

    Thanks

    Simon

  • Hi

    Gsquared

    i copied the previous data from sql pane and they are not aligining to each other, i think you can

    estimate its order. Please let me know if its not readable

    Thanks

    Simon

  • The common problem of dividing data into groups boils down to creating a dense rank. Here we want a dense rank on Statusid but ordered by ID (or DateValue). This problem should be easily solved using the Dense_Rank() function in S2005. Unfortunately all the ranking functions assume the target of the rank is also its order. The order cannot be independent of the target. This dependency was intended to be solved by the full implementation of the OVER statement in sql-99. This is known as the sql window. With the full OVER the target of a rank and its order can be independently specified. The Dense_Rank() function is actually a short hand for a generalized dense rank (using OVER). This rank is based on a 'running sum' over the target column (or more generally an expression involving the column) and can be ordered by another column.The running sum is

    equivalent to a dense rank.

    To get this type of dense rank you simply have to state the rule that increments the run. This is the same as deciding what rule bounds each rank (group). In this problem each rank is bounded by a value of 1 followed by a 2. The run is incremented whenever there is a Statusid change from a some value other than 1 (2,null) to 1. Here all you have to do is check the prior row value of Statusid. If the prior row (based in the order of ID) is a 2 the current row starts a new group (the running sum is incremented by 1).

    Sample data.

    CREATE TABLE #mytable

    (ID INT PRIMARY KEY,

    DateValue DATETIME, Prd VARCHAR(10), Statusid INT )

    INSERT INTO #mytable

    (ID, DateValue, Prd, Statusid)

    SELECT 1, '2007-10-17 00:01:00.000','A',1 UNION ALL

    SELECT 2, '2007-10-17 00:02:00.000','A',1 UNION ALL

    SELECT 3, '2007-10-17 00:03:00.000','A',1 UNION ALL

    SELECT 4, '2007-10-17 00:04:00.000','A',1 UNION ALL

    SELECT 5, '2007-10-17 00:05:00.000','B',1 UNION ALL

    SELECT 6, '2007-10-17 00:06:00.000','B',1 UNION ALL

    SELECT 7, '2007-10-17 00:07:00.000','B',1 UNION ALL

    SELECT 8, '2007-10-17 00:08:00.000','B',2 UNION ALL

    SELECT 9, '2007-10-17 00:09:00.000','B',1 UNION ALL

    SELECT 10,'2007-10-17 00:10:00.000','B',2 UNION ALL

    SELECT 11,'2007-10-17 00:11:00.000','A',1 UNION ALL

    SELECT 12,'2007-10-17 00:12:00.000','A',1 UNION ALL

    SELECT 13,'2007-10-17 00:13:00.000','A',1 UNION ALL

    SELECT 14,'2007-10-17 00:14:00.000','A',1 UNION ALL

    SELECT 15,'2007-10-17 00:15:00.000','A',2 UNION ALL

    An outer join would make it easy to check the prior Statusid.

    SELECT A.ID,A.DateValue,A.Prd,A.Statusid,B.ID,B.Statusid

    FROM #mytable AS A LEFT JOIN #mytable AS B

    ON A.ID=B.ID+1

    ORDER BY A.ID

    Using the full implemention of OVER (sql-99 window) and the Sum aggregate

    you could form the groups, get a dense rank, with one pass thru the join:

    SELECT A.ID,A.DateValue,A.Prd,A.Statusid,

    Sum(CASE WHEN B.Statusid=2 THEN 1 ELSE 0 END)

    OVER(ORDER BY A.ID

    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

    AS Setsequence

    FROM #mytable AS A LEFT JOIN #mytable AS B

    ON A.ID=B.ID+1

    Even the join could be eliminated by using a LAG function of OVER (gets the

    previous ID value)

    SELECT ID,DateValue,Prd,Statusid,

    Sum(CASE WHEN LAG(ID,1)=2 THEN 1 ELSE 0 END)

    OVER(ORDER BY ID

    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

    AS Setsequence

    FROM #mytable

    Well all this is pie in the sky for sql server because MS for some other worldy

    reason(s) decided the full implementation of OVER could wait. The are various ways to simulate OVER such as combinations of ranking functions/counts, updates, subqueries etc. None are as simple and straightforward as OVER. Of course for sql server I prefer to use Rac:)

    Exec Rac

    @TRANSFORM='_dummy_', -- Plays no part in computations.

    @ROWS='ID & DateValue(d) & Prd & Statusid',

    @PVTCOL='report mode',

    @FROM='#mytable',

    @DEFAULTS1='y',@RACHECK='y',@SHELL='n',

    -- @ROWRUNS is a running sum that does not involve a @TRANSFORM, hense Dumy:)

    @ROWRUNS='Dumy^CASE WHEN PRIOR.Statusid=2 THEN 1 ELSE 0 END^(Dumy)',

    @ROWRUNSLABEL='DenseRank',

    -- Add 1 to the rank to start at 1 (instead of 0).

    @SELECT='SELECT _ROWS_,(1*DenseRank)+1 AS Setsequence

    FROM rac

    ORDER BY rd'

    ID DateValue Prd Statusid Setsequence

    ------ ------------------- ---- -------- -----------

    1 Oct 17 2007 12:01AM A 1 1

    2 Oct 17 2007 12:02AM A 1 1

    3 Oct 17 2007 12:03AM A 1 1

    4 Oct 17 2007 12:04AM A 1 1

    5 Oct 17 2007 12:05AM B 1 1

    6 Oct 17 2007 12:06AM B 1 1

    7 Oct 17 2007 12:07AM B 1 1

    8 Oct 17 2007 12:08AM B 2 1

    9 Oct 17 2007 12:09AM B 1 2

    10 Oct 17 2007 12:10AM B 2 2

    11 Oct 17 2007 12:11AM A 1 3

    12 Oct 17 2007 12:12AM A 1 3

    13 Oct 17 2007 12:13AM A 1 3

    14 Oct 17 2007 12:14AM A 1 3

    15 Oct 17 2007 12:15AM A 2 3

    You could flag 1st and last rows of each group or whatever. But I think I'll stop here:)

    www.rac4sql.net

    www.beyondsql.blogspot.com

  • Hi

    Rog

    Thanks for your input on this!

    Is there any other way to get this result without using RAC? thats the exact result that i was looking

    for . I am using sql2k5 , that is also a big problem to use RAC

    Thanks

    simon

  • Ummm... I thought Rog gave a solution that didn't include RAC... take a look at his post again. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi

    Rog, Jeff

    I have problem running this statement

    SELECT A.ID,A.DateValue,A.Prd,A.Statusid,

    Sum(CASE WHEN B.Statusid=2 THEN 1 ELSE 0 END)

    OVER(ORDER BY A.ID

    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)-- this line throws error

    AS Setsequence

    FROM #mytable AS A LEFT JOIN #mytable AS B

    ON A.ID=B.ID+1

    Msg 102, Level 15, State 1, Line 4

    Incorrect syntax near 'ROWS'.

    Thanks

    Simon

  • simon phoenix (8/30/2008)


    Hi

    Rog, Jeff

    I have problem running this statement

    SELECT A.ID,A.DateValue,A.Prd,A.Statusid,

    Sum(CASE WHEN B.Statusid=2 THEN 1 ELSE 0 END)

    OVER(ORDER BY A.ID

    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)-- this line throws error

    AS Setsequence

    FROM #mytable AS A LEFT JOIN #mytable AS B

    ON A.ID=B.ID+1

    Msg 102, Level 15, State 1, Line 4

    Incorrect syntax near 'ROWS'.

    Thanks

    Simon

    Sorry for any confusion. The above form of OVER is NOT supported in S2k5. But RAC does run in S2k5 🙂 Anyway here is a t-sql solution that will simulate the running sum of OVER. Actually it's not as bad as I perhaps made it out to be but its no substitute for the real thing. I'm using the same sample data used in RAC.

    WITH C AS

    (

    SELECT A.ID,A.DateValue,A.Prd,A.Statusid,B.Statusid AS PriorStatusid

    FROM #mytable AS A LEFT JOIN #mytable AS B

    ON A.ID=B.ID+1

    )

    select D.ID,D.DateValue,D.Prd,D.Statusid,

    Sum(CASE WHEN E.PriorStatusid=2 THEN 1 ELSE 0 END)+1 AS Setsequence

    FROM C AS D JOIN C AS E

    ON E.ID BETWEEN 0 AND D.ID

    GROUP BY D.ID,D.DateValue,D.Prd,D.Statusid

    ORDER BY D.ID

    Note there are 2 joins, one to access the prior row Statusid and one to form the rows for grouping. It's these joins that OVER (and RAC) eliminate. For large data sets performance could be a problem here. But that's another issue 🙂

    www.rac4sql.net

    www.beyondsql.blogspot.com

  • Sorry... My bad... Roger's too. That syntax, starting at ROWS, isn't part of T-SQL in SQL Server 2005.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi

    Rog

    You are the SQl Guru, Thanks a lot , i really appreciat it, it was just bugging my head since last week. Now only job is to trap the begining and end of those distinct sequence and i think i can do it.

    If i ran into trouble , i will post it again, please look into this thread for few more days, i am really close in getting my task done with the help of great guys like you, Gsquared and jeff. i can't explain

    how greatful i am to find good people like you guys exist

    God bless you Rog, Jeff and Gsquared

    Thanks

    Simon

    ----------Happy Labor Day----------

  • Heh... ok... my turn. 🙂 And sorry it took me so long to get to this problem...

    First, in order to be fast and short, this fun problem does actually require some procedureal code, but we won't be using a Cursor or a While loop. And, we don't need extra columns nor do we need to buy a special product to get it done. AND, we don't need to self join to the same table 4 times to do it either. All we need is what appears to be the correct logical clustered index which you probably already have anyway.

    First, the test data... do read the comments please... this little trick requires a clustered index on the ID column...

    drop table #yourtable

    go

    --===== Setup the demonstration table...

    -- I'm assuming that the ID column is the CLUSTERED PRIMARY KEY

    -- and that it's named PK_YourTable_ID. Please adjust as necessary.

    -- I'm also assuming that it has a "SetSequence" column.

    CREATE TABLE #YourTable

    (

    ID INT IDENTITY(1,1),

    DateValue DATETIME,

    Prd CHAR(1),

    StatusID TINYINT,

    SetSequence INT,

    CONSTRAINT PK_YourTable_ID PRIMARY KEY CLUSTERED (ID)

    )

    INSERT INTO #YourTable

    (DateValue,Prd,StatusID)

    SELECT '2007-10-17 00:01:00.000','A','1' UNION ALL

    SELECT '2007-10-17 00:02:00.000','A','1' UNION ALL

    SELECT '2007-10-17 00:03:00.000','A','1' UNION ALL

    SELECT '2007-10-17 00:04:00.000','A','1' UNION ALL

    SELECT '2007-10-17 00:05:00.000','B','1' UNION ALL

    SELECT '2007-10-17 00:06:00.000','B','1' UNION ALL

    SELECT '2007-10-17 00:07:00.000','B','1' UNION ALL

    SELECT '2007-10-17 00:08:00.000','B','2' UNION ALL

    SELECT '2007-10-17 00:09:00.000','B','1' UNION ALL

    SELECT '2007-10-17 00:10:00.000','B','2' UNION ALL

    SELECT '2007-10-17 00:11:00.000','A','1' UNION ALL

    SELECT '2007-10-17 00:12:00.000','A','1' UNION ALL

    SELECT '2007-10-17 00:13:00.000','A','1' UNION ALL

    SELECT '2007-10-17 00:14:00.000','A','1' UNION ALL

    SELECT '2007-10-17 00:15:00.000','A','2'

    GO

    --===== Show the content of the test table

    SELECT * FROM #YourTable

    The key to this problem is very much like "trailing edge digital logic". We don't actually care where the 1's and 2's are... we only care when the StatusID drops in value. To translate into pseudo code, when the StatusID of the previous row is LESS THAN OR EQUAL TO the current row, the SetSequence will remain the same, otherwise, we need to add 1 to the SetSequence.

    Now, the clustered index makes it easy using the very same technique in the following article...

    http://www.sqlservercentral.com/articles/Advanced+Querying/61716/

    ... here's the code that does it... at a rate of a million rows every 7 seconds... 😛

    --===== Declare some necessary variables

    DECLARE @Sequence INT, --Remembers the current sequence

    @PrevStatusID TINYINT --Remembers the StatusID from the previous row

    SELECT @Sequence = 1,

    @PrevStatusID = 1

    --===== Do a procedural update using SQL Server's proprietary update

    UPDATE #YourTable

    SET @Sequence = SetSequence = CASE WHEN @PrevStatusID <= StatusID THEN @Sequence ELSE @Sequence+1 END,

    @PrevStatusID = StatusID

    FROM #YourTable WITH(INDEX(PK_YourTable_ID))

    --===== Show the results

    SELECT * FROM #YourTable

    Questions? 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • simon phoenix (8/30/2008)


    Hi

    Rog

    You are the SQl Guru, Thanks a lot , i really appreciat it, it was just bugging my head since last week. Now only job is to trap the begining and end of those distinct sequence and i think i can do it.

    If i ran into trouble , i will post it again, please look into this thread for few more days, i am really close in getting my task done with the help of great guys like you, Gsquared and jeff. i can't explain

    how greatful i am to find good people like you guys exist

    God bless you Rog, Jeff and Gsquared

    Thanks

    Simon

    ----------Happy Labor Day----------

    Simon, "you da man!" Thanks for the awesome thought. Happy Labor Day!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • simon phoenix (8/30/2008)


    Now only job is to trap the begining and end of those distinct sequence and i think i can do it.

    Missed that... sorry. Easy to do now that the SetSequence is done...

    --===== Show first and last date of each SetSequence

    SELECT SetSequence, MIN(DateValue) AS StartDateTime, MAX(DateValue) AS EndDateTime

    FROM #YourTable

    GROUP BY SetSequence

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 16 through 30 (of 48 total)

You must be logged in to reply to this topic. Login to reply