how to group and rank (with reset)

  • How do I calculate the ranking and grouping column as shown below based on the following:

    acd drv led p ranking and grouping

    -----+-----+-----+---+----------------------

    208 912 199 1 0

    208 912 200 0 1

    208 912 201 0 2

    208 912 202 1 0

    208 912 203 0 1

    208 312 199 1 0

    208 312 200 0 1

    208 312 201 0 2

    208 312 202 1 0

    208 312 203 0 1

    208 912 204 0 2

    208 312 204 0 2

    208 312 205 0 3

    208 312 206 0 4

    208 912 205 0 3

    (acd, drv, led) are the primary keys,

    led is simple ascending integers (1 to 400), and

    p is 1 or 0

    for every acd-drv-led combination, I want ranking and grouping based on p

    value. When P=1, then rank is 0. Subsequent P=zeros until the next P=1

    get ranked incrementally higher for that acd-drv combination. The ranking

    gets reset at P=0 and does not carry over to the next group even for

    the same acd-drv combination.

    This is unlike row_number() over (partition by drv, acd order by p, led), which

    does not reset the ranks at each P=1. It continues increasing the rank values

    through the entire table.

    Are there any other tricks. I'm weary of any cursor solution as I have

    to sift through 13 million records. I also don't have the luxury of

    using any CLRs.

    Thanks,

    S.J.

  • Does your table an ID column to know the order of the rows?? Without that it is highly impossible to code this requirement. With ID column, this can achieved without cursor or CLR..

  • 2 ways to do this,if there is an unique ID column.

    1. Recursive CTE - not a performant one; will be similar to the performance numbers of cursors or while loops.

    2. Quirky Update[/url] - lightning fast; but has a set of rules u MUST adhere to; else ur table will become corrupt.

  • For starters 🙂

    here is the sample data..

    IF OBJECT_ID ('#TempDB..#TempOne') IS NOT NULL

    DROP TABLE #TempOne

    GO

    CREATE TABLE #TempOne

    (

    ID INT PRIMARY KEY CLUSTERED

    , acd INT

    , drv INT

    , led INT

    , p INT

    , RankingAndGrouping INT

    );

    INSERT #TempOne (ID ,acd ,drv ,led ,p,RankingAndGrouping)

    SELECT 1, 208, 912 ,199 ,1 ,0

    UNION ALL SELECT 2, 208, 912 ,200 ,0 ,1

    UNION ALL SELECT 3, 208, 912 ,201 ,0 ,2

    UNION ALL SELECT 4, 208, 912 ,202 ,1 ,0

    UNION ALL SELECT 5, 208, 912 ,203 ,0 ,1

    UNION ALL SELECT 6, 208, 312 ,199 ,1 ,0

    UNION ALL SELECT 7, 208, 312 ,200 ,0 ,1

    UNION ALL SELECT 8, 208, 312 ,201 ,0 ,2

    UNION ALL SELECT 9, 208, 312 ,202 ,1 ,0

    UNION ALL SELECT 10, 208, 312 ,203 ,0 ,1

    UNION ALL SELECT 11, 208, 912 ,204 ,0 ,2

    UNION ALL SELECT 12, 208, 312 ,204 ,0 ,2

    UNION ALL SELECT 13, 208, 312 ,205 ,0 ,3

    UNION ALL SELECT 14, 208, 312 ,206 ,0 ,4

    UNION ALL SELECT 15, 208, 912 ,205 ,0 ,3 ;

    CREATE NONCLUSTERED INDEX NIX_acd_drv_led

    ON #TempOne (acd ,drv ,led) INCLUDE (p);

    here is the partially requirement satisfying code :hehe:

    -- Recursive CTE

    ; WITH RankedCTE AS

    (

    SELECT ID ,acd ,drv ,led ,p , CASE WHEN p = 1 THEN 0 ELSE 1 END AS RankingAndGrouping

    FROM #TempOne

    WHERE ID = 1

    UNION ALL

    SELECT BaseTable.ID , BaseTable.acd , BaseTable.drv , BaseTable.led , BaseTable.p

    , CASE WHEN BaseTable.p = 1 THEN 0 ELSE CTE.RankingAndGrouping + 1

    END AS RankingAndGrouping

    FROM RankedCTE CTE

    INNER JOIN #TempOne BaseTable

    ON CTE.ID + 1 = BaseTable.ID

    )

    SELECT OuterTable.ID , OuterTable.acd , OuterTable.drv , OuterTable.led , OuterTable.p

    , CrsApp.RankingAndGrouping

    FROM RankedCTE OuterTable

    CROSS APPLY

    (

    SELECT TOP 1 RankingAndGrouping

    FROM RankedCTE InnerTable

    WHERE InnerTable.acd = OuterTable.acd AND InnerTable.led = OuterTable.led

    ORDER BY InnerTable.ID ASC

    ) CrsApp

  • i guess a cute written cursor will smoke recursive cte here. But it will be interesting to see how quirky update works. should be stunning fast here...

  • @sjsubscribe,

    Post the CREATE TABLE statement for you table and any/all the indexes you may have on the table and I'll show you the "Quirky Update" method Cold Coffee is talking about. It'll do a million rows in about 3 seconds.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ColdCoffee (11/20/2011)


    Does your table an ID column to know the order of the rows?? Without that it is highly impossible to code this requirement. With ID column, this can achieved without cursor or CLR..

    The order of the rows is the "led" column. There's another one that has a timestamp but that shows only chronological order.

  • ColdCoffee (11/20/2011)


    For starters 🙂

    <snip>... </snip>

    thanks ColdCoffee, I think the solution is in Cross Apply. I forgot about that. I may have to tweak it a bit to get the order right for led column. Otherwise your solution is a great hint.

  • I guess my simple request for a bit more information is being ignored. 😉

    I'm not asking what I asked to make anyone jump through any type of hoop. I need the information to determine whether I should do an "insitu" Quirky Update or not.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (11/20/2011)


    @sjsubscribe,

    Post the CREATE TABLE statement for you table and any/all the indexes you may have on the table and I'll show you the "Quirky Update" method Cold Coffee is talking about. It'll do a million rows in about 3 seconds.

    Jeff Moden (11/20/2011)


    @sjsubscribe,

    Post the CREATE TABLE statement for you table and any/all the indexes you may have on the table and I'll show you the "Quirky Update" method Cold Coffee is talking about. It'll do a million rows in about 3 seconds.

    Thanks Jeff for the offer. Sorry for the delay in responding as I had to wait until I got the original table with data with the DDL from the production server.

    Instead of putting the whole thing here, I've uploaded a zip file as an attachment to this msg. It has the actual data minus some redundant fields that I removed. I am able to add other indexes if needed but cannot change the primary keys.

    A note about the fields (the data's from remote sensors):

    acd is the location code (INT)

    R is record number for that location (auto incremented from 1 to about 15K, which translates to about 15K records per location)

    drv is the deviceType (about 40 distinct types at each location, also as INT)

    P=1 is when the power reset, P=0 no power interruption

    Led is the loop cycle number. Starts at 1 and may go unto 600 per device. (also INT)

    LoadTime is the timing value I'm most interested in for stats.

    RankingAndGrouping shows values I entered manually for this exercise to show the grouping and ranking.

    Grouping should be by:

    each drv at each acd for each power cycle

    And ranking by:

    Led, where it's zero when P=1 and then increases by 1 for each subsequent Led

    The grouping and rankings are for stats on LoadTime by various RankingAndGrouping values. Best, worst, best avg, worst avg, etc, by the various groups/devices/locations.

    When this query is run on the sample dataset (attached), the correct RankingAndGrouping values are as shown:

    select * from robot.tP

    where RankingAndGrouping is not null

    order by acd, drv, R, Led

    the above query should show this result below:

    acdRdrvPLedLoadTimeRankingAndGrouping

    19918390100:00:53.15400001

    19922590200:00:47.99600002

    19926790300:00:47.87700003

    19930991400:00:47.99900000

    19935090500:00:48.09600001

    19939490600:00:48.58900002

    19943790700:00:48.23700003

    19947990800:00:48.61200004

    19952390900:00:48.37100005

    199565901000:00:48.06600006

    199180120100:00:53.45300001

    199224120200:00:48.03800002

    199268120300:00:48.05900003

    199310120400:00:48.01500004

    199358120500:00:48.49300005

    199405120600:00:48.80000006

    199446120700:00:48.44000007

    199492121800:00:48.49000000

    199535120900:00:48.47400001

    1995781201000:00:48.55500002

    20040890100:00:42.79600001

    20044990200:00:41.05900002

    20049290300:00:41.14500003

    20053491400:00:41.18400000

    20057690500:00:41.12400001

    20061890600:00:41.42700002

    20066090700:00:41.80900003

    20070290800:00:41.45100004

    20074590900:00:41.35200005

    200787901000:00:41.59700006

    200403120100:00:43.26700001

    200445120200:00:40.94100002

    200487121300:00:40.95800000

    200531120400:00:41.09100001

    200574120500:00:41.63400002

    200617120600:00:41.75700003

    200661120700:00:42.04700004

    200704120800:00:41.82500005

    200746120900:00:41.92600006

    2007881201000:00:41.91800007

  • a CTE of the posted data:

    WITH MySampleData(acd,R,drv,P,Led,LoadTime,RankingAndGrouping)

    AS

    (

    SELECT 199,183,9,0,1,00:00:53.1540000,1 UNION ALL

    SELECT 199,225,9,0,2,00:00:47.9960000,2 UNION ALL

    SELECT 199,267,9,0,3,00:00:47.8770000,3 UNION ALL

    SELECT 199,309,9,1,4,00:00:47.9990000,0 UNION ALL

    SELECT 199,350,9,0,5,00:00:48.0960000,1 UNION ALL

    SELECT 199,394,9,0,6,00:00:48.5890000,2 UNION ALL

    SELECT 199,437,9,0,7,00:00:48.2370000,3 UNION ALL

    SELECT 199,479,9,0,8,00:00:48.6120000,4 UNION ALL

    SELECT 199,523,9,0,9,00:00:48.3710000,5 UNION ALL

    SELECT 199,565,9,0,10,00:00:48.0660000,6 UNION ALL

    SELECT 199,180,12,0,1,00:00:53.4530000,1 UNION ALL

    SELECT 199,224,12,0,2,00:00:48.0380000,2 UNION ALL

    SELECT 199,268,12,0,3,00:00:48.0590000,3 UNION ALL

    SELECT 199,310,12,0,4,00:00:48.0150000,4 UNION ALL

    SELECT 199,358,12,0,5,00:00:48.4930000,5 UNION ALL

    SELECT 199,405,12,0,6,00:00:48.8000000,6 UNION ALL

    SELECT 199,446,12,0,7,00:00:48.4400000,7 UNION ALL

    SELECT 199,492,12,1,8,00:00:48.4900000,0 UNION ALL

    SELECT 199,535,12,0,9,00:00:48.4740000,1 UNION ALL

    SELECT 199,578,12,0,10,00:00:48.5550000,2 UNION ALL

    SELECT 200,408,9,0,1,00:00:42.7960000,1 UNION ALL

    SELECT 200,449,9,0,2,00:00:41.0590000,2 UNION ALL

    SELECT 200,492,9,0,3,00:00:41.1450000,3 UNION ALL

    SELECT 200,534,9,1,4,00:00:41.1840000,0 UNION ALL

    SELECT 200,576,9,0,5,00:00:41.1240000,1 UNION ALL

    SELECT 200,618,9,0,6,00:00:41.4270000,2 UNION ALL

    SELECT 200,660,9,0,7,00:00:41.8090000,3 UNION ALL

    SELECT 200,702,9,0,8,00:00:41.4510000,4 UNION ALL

    SELECT 200,745,9,0,9,00:00:41.3520000,5 UNION ALL

    SELECT 200,787,9,0,10,00:00:41.5970000,6 UNION ALL

    SELECT 200,403,12,0,1,00:00:43.2670000,1 UNION ALL

    SELECT 200,445,12,0,2,00:00:40.9410000,2 UNION ALL

    SELECT 200,487,12,1,3,00:00:40.9580000,0 UNION ALL

    SELECT 200,531,12,0,4,00:00:41.0910000,1 UNION ALL

    SELECT 200,574,12,0,5,00:00:41.6340000,2 UNION ALL

    SELECT 200,617,12,0,6,00:00:41.7570000,3 UNION ALL

    SELECT 200,661,12,0,7,00:00:42.0470000,4 UNION ALL

    SELECT 200,704,12,0,8,00:00:41.8250000,5 UNION ALL

    SELECT 200,746,12,0,9,00:00:41.9260000,6 UNION ALL

    SELECT 200,788,12,0,10,00:00:41.9180000,7

    )

    Select * from MySampleData

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • sjsubscribe (11/22/2011)


    Jeff Moden (11/20/2011)


    @sjsubscribe,

    Post the CREATE TABLE statement for you table and any/all the indexes you may have on the table and I'll show you the "Quirky Update" method Cold Coffee is talking about. It'll do a million rows in about 3 seconds.

    Jeff Moden (11/20/2011)


    @sjsubscribe,

    Post the CREATE TABLE statement for you table and any/all the indexes you may have on the table and I'll show you the "Quirky Update" method Cold Coffee is talking about. It'll do a million rows in about 3 seconds.

    Thanks Jeff for the offer. Sorry for the delay in responding as I had to wait until I got the original table with data with the DDL from the production server.

    Instead of putting the whole thing here, I've uploaded a zip file as an attachment to this msg. It has the actual data minus some redundant fields that I removed. I am able to add other indexes if needed but cannot change the primary keys.

    A note about the fields (the data's from remote sensors):

    acd is the location code (INT)

    R is record number for that location (auto incremented from 1 to about 15K, which translates to about 15K records per location)

    drv is the deviceType (about 40 distinct types at each location, also as INT)

    P=1 is when the power reset, P=0 no power interruption

    Led is the loop cycle number. Starts at 1 and may go unto 600 per device. (also INT)

    LoadTime is the timing value I'm most interested in for stats.

    RankingAndGrouping shows values I entered manually for this exercise to show the grouping and ranking.

    Grouping should be by:

    each drv at each acd for each power cycle

    And ranking by:

    Led, where it's zero when P=1 and then increases by 1 for each subsequent Led

    The grouping and rankings are for stats on LoadTime by various RankingAndGrouping values. Best, worst, best avg, worst avg, etc, by the various groups/devices/locations.

    When this query is run on the sample dataset (attached), the correct RankingAndGrouping values are as shown:

    select * from robot.tP

    where RankingAndGrouping is not null

    order by acd, drv, R, Led

    the above query should show this result below:

    acdRdrvPLedLoadTimeRankingAndGrouping

    19918390100:00:53.15400001

    19922590200:00:47.99600002

    19926790300:00:47.87700003

    19930991400:00:47.99900000

    19935090500:00:48.09600001

    19939490600:00:48.58900002

    19943790700:00:48.23700003

    19947990800:00:48.61200004

    19952390900:00:48.37100005

    199565901000:00:48.06600006

    199180120100:00:53.45300001

    199224120200:00:48.03800002

    199268120300:00:48.05900003

    199310120400:00:48.01500004

    199358120500:00:48.49300005

    199405120600:00:48.80000006

    199446120700:00:48.44000007

    199492121800:00:48.49000000

    199535120900:00:48.47400001

    1995781201000:00:48.55500002

    20040890100:00:42.79600001

    20044990200:00:41.05900002

    20049290300:00:41.14500003

    20053491400:00:41.18400000

    20057690500:00:41.12400001

    20061890600:00:41.42700002

    20066090700:00:41.80900003

    20070290800:00:41.45100004

    20074590900:00:41.35200005

    200787901000:00:41.59700006

    200403120100:00:43.26700001

    200445120200:00:40.94100002

    200487121300:00:40.95800000

    200531120400:00:41.09100001

    200574120500:00:41.63400002

    200617120600:00:41.75700003

    200661120700:00:42.04700004

    200704120800:00:41.82500005

    200746120900:00:41.92600006

    2007881201000:00:41.91800007

    Appologies for the delay... I was trying to salvage your zip file. The table in your zip file doesn't have the same primary key as what you described in your original post. The data in the file violates at least one not null constraint and the data for the LED column isn't incremental in the file.

    If you could provide the correct data, I could give this a whirl. I'd use Lowell's data because it looks correct but I'd rather use the data you actually have.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply