How to get start and end value from a sequence of numbers - Grouped by a different field.

  • Hi All,

    I have a requirement to obtain the start value and end value of a sequence based on a particular column.

    Ex:

    Serial Number - Reseller Name

    ===============================

    1001 RA01

    1002 RA01

    1003 RA01

    1004 RA01

    1005 RA02 --> This value changes in middle of sequence.

    1006 RA02

    1007 RA02

    1008 RA02

    1009 RA01 --> Again this value changes

    1010 RA01

    1011 RA01

    The requirement is to obtain the output for the reseller RA01 as:

    Start Val - End Val - Reseller Name

    ===========================

    1001 1004 RA01

    1009 1011 RA01

  • Here is a version that works but I am not sure a cursor solution might be faster.

    It also misses the special case of one row for a Name entry (no max value for the range)

    Assume table test1 has columns id, name (similar to your example data)

    select d.minid, min(d1.maxid) as maxid, d.name

    from (select c.id as minid, c.name from test1 c

    where c.id not in (SELECT a.id

    from test1 a inner join test1 b

    on a.name = b.name

    where a.id > b.id and (a.id - b.id) = 1)) d

    inner join

    (select c1.id as maxid, c1.name from test1 c1

    where c1.id not in (SELECT a1.id

    from test1 a1 inner join test1 b1

    on a1.name = b1.name

    where a1.id < b1.id and (a1.id - b1.id) = -1)) d1

    on d1.name = d.name where d.minid < d1.maxid

    group by d.minid, d.name

    mike

  • Hi Mike,

    Seems the query is taking a while to execute. I tried it for 20minutes.. not have seen any output.

    Does this have any performance dependancy on the size of the table ?

    My table is substantially big for the query to execute.

    Anyways thanks for the query.

    do u have a better solution ?

  • yes, because of the joins and the use of "not in" it should get much worse with a large data table.

    you might try it looking at one name (add to where clause) rather than the version I gave which will generate a table for all names.

    Again, I think a cursor solution might be faster on a large table as it would just take one pass through the table and the proc could be set up to catch single line entries as well.

  • gvsriramakrishna (1/7/2009)


    Hi Mike,

    Seems the query is taking a while to execute. I tried it for 20minutes.. not have seen any output.

    Does this have any performance dependancy on the size of the table ?

    My table is substantially big for the query to execute.

    Anyways thanks for the query.

    do u have a better solution ?

    How many rows in the real table?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • 480002 rows

  • gvsriramakrishna (1/8/2009)


    480002 rows

    I'll be right back...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Mike,

    Your solution doesn't seem to be working.

    It ran for 1 1/2 hrs.

    Seems too much as a solution.

    Thanks.

  • You'll like this... takes just seconds to run on a half million rows... of course, you'll need to change #YourTable to your actual table name. For future posts, please see the following article... you'll get answers much more quickly...

    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    --===== Create and populate a test table.

    -- This is NOT part of the solution.

    CREATE TABLE #YourTable

    (

    [Serial Number] INT,

    [Reseller Name] VARCHAR(100)

    )

    INSERT INTO #YourTable

    ([Serial Number],[Reseller Name])

    SELECT '1001','RA01' UNION ALL

    SELECT '1002','RA01' UNION ALL

    SELECT '1003','RA01' UNION ALL

    SELECT '1004','RA01' UNION ALL

    SELECT '1005','RA02' UNION ALL

    SELECT '1006','RA02' UNION ALL

    SELECT '1007','RA02' UNION ALL

    SELECT '1008','RA02' UNION ALL

    SELECT '1009','RA01' UNION ALL

    SELECT '1010','RA01' UNION ALL

    SELECT '1011','RA01'

    --================================================================

    -- Solution starts here

    --================================================================

    --===== Copy the data from your table to a place we can work on it

    SELECT ISNULL([Serial Number],0) AS [Serial Number],

    [Reseller Name],

    CAST(0 AS INT) AS Section

    INTO #WorkArea

    FROM #YourTable

    --===== This clustered index is absolutely necessary to maintain the

    -- order of the "pseudo cursor" or "quirky" update

    ALTER TABLE #WorkArea

    ADD PRIMARY KEY CLUSTERED ([Serial Number]) WITH FILLFACTOR = 100

    --===== Declare a couple of obviously named variables...

    DECLARE @PrevSN INT,

    @PrevRN VARCHAR(100),

    @PrevSection INT

    --===== ... and preset them to known values.

    SELECT @PrevSN = 0,

    @PrevRN = '',

    @PrevSection = 0

    --===== Do the "pseudo cursor" update using the index scan to force

    -- the correct order of the update (NOT GUARANTEED FOR SELECTS).

    -- This should only take about 3 seconds to run on 480K rows.

    UPDATE #WorkArea

    SET @PrevSection = Section = CASE WHEN @PrevRN = [Reseller Name]

    THEN @PrevSection

    ELSE @PrevSection + 1

    END,

    @PrevRN = [Reseller Name],

    @PrevSN = [Serial Number] --"anchor"

    FROM #WorkArea WITH(INDEX(0),TABLOCKX)

    --===== Display the desired results.

    SELECT MIN([Serial Number]) AS [Start Val],

    MAX([Serial Number]) AS [End Val],

    [Reseller Name]

    FROM #WorkArea

    GROUP BY Section,[Reseller Name]

    ORDER BY [Reseller Name], Section

    I strongly recommend that you carefully read the following article so you know how that code works...

    http://www.sqlservercentral.com/articles/Advanced+Querying/61716/

    This type of "data smear" is very similar to a running total. 😉

    Be very careful when using this method... leave ANYTHING out and the data will be bad. Do it right and it's up to hundreds of times faster than a cursor and thousands of times faster than some improperly rendered set based code.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • So, gvsriramakrishna... any feedback on this?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (1/10/2009)


    So, gvsriramakrishna... any feedback on this?

    I think you deserve some:)

    I guess Update is ok:) It's conceptually similar to using the rowindicator parameter in Rac which computes a rank over ResellerName in the order of SerialNumber. This is a dense rank problem but unfortuneatly it's never referred to by that description. How the problem is described (differently by all who post it) is itself part of the problem:)

    Exec Rac

    @transform='_dummy_',

    @rows='ResellerName & SerialNumber',

    @rowsort='SerialNumber',

    @pvtcol='Report Mode',

    @from='Sellers',

    @rowindicators='ResellerName{Grp}',

    @rowbreak='n',@defaults1='y',@racheck='y',@shell='n',

    @select='select Min(SerialNumber) as [Start Val],Max(SerialNumber) as [End],ResellerName

    from rac

    where ResellerName=~RA01~

    group by ResellerName,Grp

    order by Grp'

    Start Val End ResellerName

    --------- ---- ------------

    1001 1004 RA01

    1009 1011 RA01

    The similarity with what you've done is even clearer if you use a running sum (@rowruns) with the built in prior row value(s).

    Exec Rac

    @transform='_dummy_',

    @rows='SerialNumber & ResellerName ',

    @rowsort='SerialNumber',

    @pvtcol='Report Mode',

    @from='Sellers',

    @rowbreak='n',@defaults1='y',@racheck='y',@shell='n',

    @rowruns='^case when ResellerName=prior.ResellerName then 0 else 1 end^(dumy)',

    @rowrunslabel='Grp',

    @select='select Min(SerialNumber) as [Start Val],Max(SerialNumber) as [End Val],ResellerName

    from rac

    where ResellerName=~RA01~

    group by ResellerName,Grp

    order by Grp'

    Or use a running sumwith a join to compare prior and current ResellerName values.

    Exec Rac

    @transform='_dummy_',

    @rows='A.SerialNumber as ASerial & A.ResellerName as AReseller &

    B.SerialNumber as BSerial & B.ResellerName as BReseller ',

    @rowsort='A.SerialNumber',

    @pvtcol='Report Mode',

    @from='Sellers as A left join Sellers as B on A.SerialNumber-1=B.SerialNumber',

    @where='1=1',

    @rowbreak='n',@defaults1='y',@racheck='y',@shell='n',

    @rowruns='^case when AReseller=BReseller then 0 else 1 end^(dumy)',

    @rowrunslabel='Grp',

    @select='select Min(ASerial) as [Start Val],Max(ASerial) as [End Val],AReseller as ResellerName

    from rac

    where AReseller=~RA01~

    group by AReseller,Grp

    order by Grp'

    Finally here's the sql-99 standard OLAP solution using an window function. This is possible to do in DB2, Oracle and Teradata databases. MS obviously thinks it's not significant enough to add to sql server. They're had over 5 years to mull it OVER:)

    SELECT Min(SerialNumber) as [Start Val],Max(SerialNumber) as [End Val],ResellerName

    FROM

    (SELECT A.SerialNumber,A.ResellerName,

    -- This is a running sum, it amounts to a dense rank over A.ResellerName in the

    -- direction of A.SerialNumber. It's an sql window defined in the standard.

    Sum(CASE WHEN A.ResellerName=B.ResellerName THEN 0 ELSE 1 END) OVER

    (ORDER BY A.SerialNumber

    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Grp

    FROM Sellers AS A LEFT JOIN Sellers AS B ON A.SerialNumber-1=B.SerialNumber

    WHERE ResellerName='RA01') AS C

    GROUP BY ResellerName,Grp

    ORDER BY Grp

    Is this a better solution than writing code for the Update statement? The whole point of OLAP in the standards was to make problems like this 'easier' to solve than existing sql permits/permitted. I guess Rac is still the easiest solution in sql server:) When problems get more involved and/ or more columns are involved, writing code for Update becomes messy:) I suppose some sql mavens embrace it but there's a lot of developers out there that will scratch their head, or lower, over it:)

    I suspect that using Update makes some users feel uncomfortable as it brings into question some fundamental ideas about what a relational database can and cannot do. I would suggest that instead of knocking the technique they would simply give up the ghost that sql server is, in fact, a relational database. It's no such thing. So dig around for things like this with a clear conscience:)

    www.rac4sql.net

    www.beyondsql.blogspot.com

  • Oops.. there seems to be an error in th SQL query at the over keyword.

    Incorrect syntax near the keyword 'OVER'.

  • gvsriramakrishna (1/11/2009)


    Oops.. there seems to be an error in th SQL query at the over keyword.

    Incorrect syntax near the keyword 'OVER'.

    You should read the words, not just sql code:) I listed the systems that it will execute on. MS has not implemented enough of OLAP for the query to work.

  • Here is another way of getting the beginning and end of a sequence. The downside of this way is that it assumes that there are no gaps in the numbers inside the sequence. If there is one, it will consider it as a different sequence. It also runs fast on a table with half a million rows.

    ;with MyCTE as (

    select [Serial Number] as SR, [Reseller Name] as RN, row_number() over (order by [Reseller Name], [Serial number]) as RowNumber

    from #YourTable

    )

    select min(SR), max(SR), RN

    from MyCTE

    where RN = 'RA01'

    group by RN, SR-RowNumber

    order by min(sr)

    Adi

    --------------------------------------------------------------
    To know how to ask questions and increase the chances of getting asnwers:
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • gvsriramakrishna (1/11/2009)


    Oops.. there seems to be an error in th SQL query at the over keyword.

    Incorrect syntax near the keyword 'OVER'.

    No... I wanted to know how the UPDATE solution works. The OVER solution that Rog Pike posted won't work in SQL Server and he said so.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply