Rank on repeating groups of bit field

  • Lets say I have some data where a bit value changes over time and I want to rank it by the repeating groups, how do I write the SQL so that I get the result below?

    I want to sort by create date, and every time the bit changes value we start the rank over.

    If I partition by the bit field, it will just group the entire data set by the bit field.

    Bit CreateDate Rank

    1 3/15/2014 1

    1 3/14/2014 2

    0 3/9/2014 1

    0 3/8/2014 2

    0 3/6/2014 3

    1 3/4/2014 1

    0 2/28/2014 1

    0 2/20/2014 2

    0 2/15/2014 3

    0 2/10/2014 4

  • Jim Shaffer (1/16/2015)


    Lets say I have some data where a bit value changes over time and I want to rank it by the repeating groups, how do I write the SQL so that I get the result below?

    I want to sort by create date, and every time the bit changes value we start the rank over.

    If I partition by the bit field, it will just group the entire data set by the bit field.

    Bit CreateDate Rank

    1 3/15/2014 1

    1 3/14/2014 2

    0 3/9/2014 1

    0 3/8/2014 2

    0 3/6/2014 3

    1 3/4/2014 1

    0 2/28/2014 1

    0 2/20/2014 2

    0 2/15/2014 3

    0 2/10/2014 4

    Quick suggestion, use a running total of the Bit column in the Date order to mark the groups, here is an example.

    ๐Ÿ˜Ž

    USE tempdb;

    GO

    SET NOCOUNT ON;

    DECLARE @SAMPLE_DATA TABLE

    (

    XBit INT NOT NULL

    ,CreateDate DATE NOT NULL

    );

    INSERT INTO @SAMPLE_DATA (XBit,CreateDate)

    VALUES

    (1, '3/15/2014')

    ,(1, '3/14/2014')

    ,(0, '3/9/2014')

    ,(0, '3/8/2014')

    ,(0, '3/6/2014')

    ,(1, '3/4/2014')

    ,(0, '2/28/2014')

    ,(0, '2/20/2014')

    ,(0, '2/15/2014')

    ,(0, '2/10/2014')

    ;

    ;WITH BASE_DATA AS

    (

    SELECT

    SD.CreateDate

    ,SD.XBit

    ,SUM(SD.XBit) OVER

    (

    PARTITION BY (SELECT NULL)

    ORDER BY SD.CreateDate

    ROWS BETWEEN UNBOUNDED PRECEDING

    AND CURRENT ROW

    ) AS SMBIT

    FROM @SAMPLE_DATA SD

    )

    SELECT

    BD.XBit

    ,BD.CreateDate

    ,BD.SMBIT

    ,RANK() OVER

    (

    PARTITION BY BD.SMBIT

    ORDER BY BD.CreateDate

    ) AS BD_RNK

    FROM BASE_DATA BD;

    Results

    XBit CreateDate SMBIT BD_RNK

    ----- ----------- ------ -------

    0 2014-02-10 0 1

    0 2014-02-15 0 2

    0 2014-02-20 0 3

    0 2014-02-28 0 4

    1 2014-03-04 1 1

    0 2014-03-06 1 2

    0 2014-03-08 1 3

    0 2014-03-09 1 4

    1 2014-03-14 2 1

    1 2014-03-15 3 1

  • Another solution for those inclined to the cryptic:

    SELECT XBit, CreateDate, b

    ,ROW_NUMBER() OVER

    (

    PARTITION BY b

    ORDER BY CreateDate

    )

    FROM

    (

    SELECT XBit, CreateDate

    ,b=MAX(b) OVER

    (

    ORDER BY CreateDate

    ROWS UNBOUNDED PRECEDING

    )

    FROM

    (

    SELECT XBit, CreateDate

    ,b=CAST(NULLIF(XBit, 0) AS BINARY(4)) + CAST(CreateDate AS BINARY(4))

    FROM @SAMPLE_DATA

    ) a

    ) a;

    For an explanation of how this works, I'll lead you to Mr. Itzik Ben-Gan's article, which provides a much better description than I ever could:

    http://sqlmag.com/t-sql/last-non-null-puzzle


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • This might not be the most efficient way of solving the puzzle, but it does generate the correct results - which I always put before performance or code sophistication:

    DECLARE @SAMPLE_DATA TABLE

    (

    XBit INT NOT NULL,

    CreateDate DATE NOT NULL,

    DesiredRank INT NOT NULL

    );

    INSERT INTO @SAMPLE_DATA (XBit,CreateDate, DesiredRank)

    VALUES

    (1, '3/15/2014', 1)

    ,(1, '3/14/2014', 2)

    ,(0, '3/9/2014', 1)

    ,(0, '3/8/2014', 2)

    ,(0, '3/6/2014', 3)

    ,(1, '3/4/2014', 1)

    ,(0, '2/28/2014', 1)

    ,(0, '2/20/2014', 2)

    ,(0, '2/15/2014', 3)

    ,(0, '2/10/2014', 4)

    ; WITH CTE AS (

    SELECT *,

    grp = ROW_NUMBER() OVER(ORDER BY CreateDate) - ROW_NUMBER() OVER(PARTITION BY XBit ORDER BY CreateDate)

    FROM @SAMPLE_DATA

    )

    SELECT *,

    NewRank = ROW_NUMBER() OVER(PARTITION BY grp ORDER BY CreateDate DESC)

    FROM CTE

    ORDER BY CreateDate DESC

    โ€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.โ€ - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply