count of consecutive value

  • Not easy to explain, I hope I will make it clear.

    I have a table with a column containing a result code.

    I want to know what was the latest result code and the count of consecutive result code until result code differ

    Here some examples, assume there is an order by showing rows by their insert date (most recent first)

    id result code

    --- -----------

    7 A

    6 A

    5 A

    4 C

    3 A

    2 B

    1 A

    the query should then return A and 3 (because A is the most recent result code and there is 3 consecutive A

    id result code

    --- -----------

    7 B

    6 A

    5 B

    4 B

    3 B

    2 B

    1 A

    The query should return B and 1

    id result code

    --- -----------

    7 B

    6 B

    5 B

    4 B

    3 C

    2 C

    1 C

    The query should return B and 4

    and there a way to do this??

    thank you

    Martin

  • I'll trade ya... 😉 Tell me the business reason behind this and I'll give you some code to do it. The reason for the "trade" is, I'm writting an article on things like this and I think the business reasons are important for the article. My problem has been that everytime I give someone the code first, I never hear from them afterwards.

    Things like what the table actually holds an why, from a business rule standpoint, something like this needs to be done would be very helpful.

    Thanks.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sure

    I have a Student table in relation with an exam table. So each student has done n exams.

    Each exam has a final cote (the result code in my initial post)

    I want to measure the performance of each student, following the rules mentionned in my initial post.

    Thank you

    Martin

  • Thanks, Martin... if someone doesn't beat me to it, I'll give it a whirl after work.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • dubem1 (1/30/2009)


    Sure

    I have a Student table in relation with an exam table. So each student has done n exams.

    Each exam has a final cote (the result code in my initial post)

    I want to measure the performance of each student, following the rules mentionned in my initial post.

    Thank you

    Martin

    One more question... I can understand why you need the latest ID for each, but why do you need to know how many rows were in the sequence that ends with that ID?

    Anyway, here's the code... details are in the comments...

    USE TempDb --Just a safe place to run things

    GO

    --DROP TABLE #MyHead, dbo.YourTable

    GO

    --===== Populate the test table (note the clustered PK)

    -- (Not part of the solution)

    CREATE TABLE dbo.YourTable

    (

    ID INT,

    ResultCode CHAR(1),

    CONSTRAINT PK_YourTable_ID PRIMARY KEY CLUSTERED(ID)

    )

    --===== Populate the test table

    -- (Not part of the solution)

    INSERT INTO dbo.YourTable

    (ID,ResultCode)

    SELECT '7','A' UNION ALL

    SELECT '6','A' UNION ALL

    SELECT '5','A' UNION ALL

    SELECT '4','C' UNION ALL

    SELECT '3','A' UNION ALL

    SELECT '2','B' UNION ALL

    SELECT '1','A'

    --===== Display the content of the test table

    SELECT * FROM dbo.YourTable

    ------------------------------------------------------------

    -- Solution

    ------------------------------------------------------------

    --===== Copy the test table to a temporary table where we

    -- can easily add a column...

    SELECT ISNULL(ID,0) AS ID, --ISNULL give the column a NOT NULL constraint

    ResultCode,

    CAST(0 AS INT) AS RunningCount

    INTO #MyHead

    FROM dbo.YourTable

    --===== Add a clustered index (PK in this case)

    ALTER TABLE #MyHead

    ADD PRIMARY KEY CLUSTERED (ID)

    --===== Declare some obviously named variables

    DECLARE @RunningCount INT,

    @PrevResultCode CHAR(1)

    --===== Do a little magic update to "rank" the rows in a way that

    -- ROW_NUMBER() and RANK() can't do.

    UPDATE #MyHead

    SET @RunningCount = RunningCount = CASE WHEN @PrevResultCode = ResultCode THEN @RunningCount+1 ELSE 1 END,

    @PrevResultCode = ResultCode

    FROM #MyHead WITH (INDEX(0))

    --===== Display the results after the update

    SELECT * FROM #MyHead ORDER BY ID

    --===== Display the desired results

    SELECT t.ID,

    t.ResultCode,

    t.RunningCount

    FROM #MyHead t

    INNER JOIN

    (--==== Find MAX ID for each ResultCode

    SELECT MAX(ID) AS MaxID, ResultCode

    FROM #MyHead

    GROUP BY ResultCode

    )d

    ON t.ID = d.MaxID

    AND t.ResultCode = d.ResultCode

    ORDER BY t.ResultCode

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • It's for statistics purpose to analyse the tendencies of students.

    Thanks for the solution, it's not simple, I will try it.

    Yesterday I browse the web to find by myself a solution and I found an article with a similar problem than mine.

    They use something like this :

    ROW_NUMBER() OVER(PARTITION BY student

    ORDER BY dt, slot) -

    I don't know if it can solve my problem. If you are curious,here is the link to the article :

    http://www.sqlmag.com/Article/ArticleID/93606/sql_server_93606.html

    thanks again

    Martin

  • You should try both, evaluate execution plans, and performance and choose the one that performs best. In my experience Jeff's solutions usually out-perform others.

    The only way to become a Jedi like Jeff you need to test the solutions yourself.

  • dubem1 (1/31/2009)


    It's for statistics purpose to analyse the tendencies of students.

    Thanks for the solution, it's not simple, I will try it.

    Yesterday I browse the web to find by myself a solution and I found an article with a similar problem than mine.

    They use something like this :

    ROW_NUMBER() OVER(PARTITION BY student

    ORDER BY dt, slot) -

    I don't know if it can solve my problem. If you are curious,here is the link to the article :

    http://www.sqlmag.com/Article/ArticleID/93606/sql_server_93606.html

    thanks again

    Martin

    Thanks, again... I guess my real question would be, why do you need a count of the last sequence for each student and not the other sequences for the same student? What is that count called and what will it be used for? You say its for statistics... what is that statistic called and what is it's purpose? Thanks for being patient with my questions, Martin.

    Shirfting gerars... Yes, Row_Number is quite handy... but I believe you'll find that there's nothing to restart the counts on the Result Code in your table which makes it just about impossible to use the simplicity of Row_Number or Rank for this particular problem.

    So far as the complexity of the solution goes, about a 3rd of it is setting up YOUR test data so you can play with the code. Folks like me prefer that you provide the code for a test table and the insert commands to populate it... I did that for you, this time. See the link in my signature for more details on how to provide data during your posts. It's got some other goodies in it, too.

    What my solution does (other than setting up your test data and displaying interim results) is very simple... I copy the data from "yourtable" to a temp table that has an extra column for the count you want and so I can ensure the code has total control over the data and indexes. I add a clustered index on the ID and it has to be clustered to control the order of the update. Then, I do a simple update on the new column that does just like you would manually... if the result code changes, the count restarts at 1... if it doesn't, it counts up by one.

    Then, I find the last group (highest ID, actually) for each result code and use that to display the highest count for that result code.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jack Corbett (1/31/2009)


    You should try both, evaluate execution plans, and performance and choose the one that performs best. In my experience Jeff's solutions usually out-perform others.

    The only way to become a Jedi like Jeff you need to test the solutions yourself.

    Heh... thanks for the confidence, Jack...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I took a look at the winning solution for that contest... pretty clever.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Nah... I take it back... not clever... freakin' brilliant. The key to this whole thing is the grouping. You've gotta be able to group each "cluster" of rows that are in a sequence by date and slot. I knew it couldn't be done with a single ROW_NUMBER, but the winner used the difference between two ROW_NUMBERs with different partitions to generate the GRP for each student... absolutely brilliant.

    Before I studied the winners code, I did a "data smear" using the "running total quirky update" to form the groups. The advantage with the quirky update is that it'll work in both SQL Server 2k and 2k5. But, for 2k5, the winner's solution beats the quirky update.

    Here's the results...

    7936 ms - Quirky update

    7390 ms - Winner's code

    I'll take that 6.9% defeat as a substantial butt whipping... further, the winner's code also has the advantage of being able to be put into a view, if you need to do such a thing. My hat's off to the winner... I didn't believe such a thing was possible.

    Lordy, I love it when I learn something new! 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • By the way... if anyone else wants to study the code, the original "puzzle" question and two sets of data are available at the following URL...

    http://www.sqlmag.com/article/articleid/93462/sql_server_blog_93462.html

    ... and like I've done in the past, they use a Numbers or Tally table to gen the larger data set.

    Some of the solutions, including the winning solution I'm so amazed at (actually, there were two that used the same technique), are located at the following URL...

    http://www.sqlmag.com/Articles/ArticleID/93606/93606.html?Ad=1

    It's good stuff, folks... give it a read...

    For those that may need an SQL Server 2000 solution that performs (heh) almost :Pas well, here's what the code I wrote looked like...

    --===== Move the data to where we can work on it and add

    -- A row number in the process.

    SELECT IDENTITY(INT,1,1) AS RowNum,

    Student, Dt, Slot, Attend,

    CAST(0 AS INT) AS Span

    INTO #MyHead

    FROM dbo.Attendance

    ORDER BY Student, Dt, Slot

    --===== Add the clustered index for the quirky update to use (PK in this case)

    ALTER TABLE #MyHead

    ADD PRIMARY KEY CLUSTERED (Student, Dt, Slot)

    --===== Declare a couple of variables for the quirky update using

    -- some self evident names.

    DECLARE @PrevStudent VARCHAR(10),

    @PrevAttend BIT,

    @PrevSpan INT

    --===== Do the quirky update to form the "groups" (spans)

    UPDATE #MyHead

    SET @PrevSpan = Span = CASE WHEN @PrevStudent = Student

    AND @PrevAttend = Attend

    THEN @PrevSpan

    ELSE ISNULL(@PrevSpan,0) + 1

    END,

    @PrevAttend = Attend,

    @PrevStudent = Student

    FROM #MyHead WITH (INDEX(0),TABLOCKX)

    OPTION (MAXDOP 1)

    --===== Do the final select using the groups (spans)

    SELECT lo.Student AS Student,

    lo.Dt AS From_Dt, lo.Slot AS From_Slot,

    hi.Dt AS To_Dt , hi.Slot AS To_Slot,

    hi.Attend,

    span.Cnt

    FROM (--==== Derived table "span" get's the start and end mark

    -- for each "span" of data

    SELECT Student,

    MIN(RowNum) AS MinRowNum,

    MAX(RowNum) AS MaxRowNum,

    COUNT(*) AS Cnt

    FROM #MyHead

    GROUP BY Student, Span

    )span

    INNER JOIN #MyHead lo ON span.MinRowNum = lo.RowNum

    INNER JOIN #MyHead hi ON span.MaxRowNum = hi.RowNum

    ORDER BY Student, From_Dt, From_Slot

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ok... coming full circle back to Martin's original problem...

    Here's the test data...

    USE TempDb --Just a safe place to run things

    GO

    --DROP TABLE dbo.YourTable

    GO

    --===== Populate the test table (note the clustered PK)

    -- (Not part of the solution)

    CREATE TABLE dbo.YourTable

    (

    ID INT,

    ResultCode CHAR(1),

    CONSTRAINT PK_YourTable_ID PRIMARY KEY CLUSTERED(ID)

    )

    --===== Populate the test table

    -- (Not part of the solution)

    INSERT INTO dbo.YourTable

    (ID,ResultCode)

    SELECT '7','A' UNION ALL

    SELECT '6','A' UNION ALL

    SELECT '5','A' UNION ALL

    SELECT '4','C' UNION ALL

    SELECT '3','A' UNION ALL

    SELECT '2','B' UNION ALL

    SELECT '1','A'

    --===== Display the content of the test table

    SELECT * FROM dbo.YourTable

    ... and here's that new fancy-dancy span resolver to solve the original problem. Had to add an extra CTE to get just the last span and I also added the answer to what should be the next question... what are the Min and Max ID's for the last span of each ResultCode...

    ;WITH CteSpan AS

    (--==== Creates groups ordered by ID breaking at ResultCode

    SELECT Resultcode, ID,

    CAST(ResultCode AS BINARY(8)) + CAST(ID AS BINARY(4)) AS ResultCode_ID,

    ROW_NUMBER() OVER (ORDER BY ID) --Just in case there's missing ID's

    - ROW_NUMBER() OVER(PARTITION BY ResultCode ORDER BY ResultCode,ID) AS SpanNum

    FROM dbo.YourTable --order by id

    )

    , cteCnt AS

    (--==== Gets the ID's, the count, the max combined Resultcode/ID for each span

    SELECT Resultcode,

    MIN(ID) AS MinID,

    MAX(ID) AS MaxID,

    MAX(ResultCode_ID) AS MaxResultCode_ID,

    COUNT(*) AS Cnt

    FROM CteSpan

    GROUP BY Resultcode, SpanNum

    )

    , cteMax AS

    (--==== Extra CTE just to get the last span for each Resultcode

    SELECT Resultcode,

    MAX(ResultCode_ID) AS MaxResultCode_ID

    FROM CteSpan

    GROUP BY Resultcode

    )

    SELECT cnt.MinID, cnt.MaxID, cnt.Resultcode, cnt.Cnt

    FROM cteCnt cnt

    INNER JOIN cteMax mx ON cnt.MaxResultCode_ID = mx.MaxResultCode_ID

    ORDER BY cnt.ResultCode

    Heh... see? I told ya I loved it when I learned something new. 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (1/31/2009)


    Nah... I take it back... not clever... freakin' brilliant. The key to this whole thing is the grouping. You've gotta be able to group each "cluster" of rows that are in a sequence by date and slot. I knew it couldn't be done with a single ROW_NUMBER, but the winner used the difference between two ROW_NUMBERs with different partitions to generate the GRP for each student... absolutely brilliant.

    Heh, I agree 100% Jeff. I am stupefied at the cleverness of this mathematical trick, (and PO'd at myself for not having figured it out for myself before now). I am still trying to figure out in my head exactly why it never repeats a group number within the larger partitions.

    Cool stuff. I sure hope that I remember to use this trick in the future...

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • RBarryYoung (1/31/2009)


    ...I am still trying to figure out in my head exactly why it never repeats a group number within the larger partitions...

    Oh, I see. It does repeat with the larger partition, but not for each unique value of Attend within each student. The group numbers for each Student+Attend sequence never repeat because they always increase. Nice...

    Dang this is a nice trick! I so wish I knew about it years ago. 🙂

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

Viewing 15 posts - 1 through 15 (of 44 total)

You must be logged in to reply to this topic. Login to reply