Select And Count Consecutive Numbers

  • 1) How can I select only the first record in a series of certain consecutive numbers?

    2) How can I obtain the count for the number of times a set of those certain consecutive numbers appear?

    Example: (this assumes an un-altered Northwind database)

    USE Northwind

    GO

    WITH Peter AS

    (SELECT EmployeeID, Freight,

    ROW_NUMBER() OVER(PARTITION BY EmployeeID ORDER BY Freight) AS [RecordNumber]

    FROM Orders)

    SELECT * FROM Peter

    Partial Result Set

    EmployeeID   Freight   RecordNumber

    --------------   --------   ------------------

    1                   0.21       1

    1                   0.45       2

    1                   0.93       3

    1                   1.27       4

    1                   1.35       5

    1                   1.36       6

    1                   1.51       7

    1                   1.66       8

    1                   2.50       9

    1                   3.94       10

    1                   4.03       11

    1                   4.27       12

    1                   4.41       13

    1                   4.98       14

    1                   4.99       15

    1                   4.99       16

    1                   7.46       17

    From the list, records 5 and 6 show consecutive Freight figures of 1.35 and 1.36 for EmployeeID 1.

    From the list, records 14 and 15 show consecutive Freight figures of 4.98 and 4.99 for EmployeeID 1.

    In this example how could I get a result set that only showed records, 1.35 and 4.98 - that is the first record in a series of consecutive Freight figures?

    How could I get a count of the number of consecutive records for each set?

    A partial result set should look like this.

    EmployeeID   1stFreightRecord   Count

    1                   1.35                     2

    1                   4.98                     2

  • David,

    Is it always going to be incrementals steps of 0.01?

    The DISTICT was necessary because there where 2 values of 4.99, namely 16 and 17.

    I hope this is what you needed?

    WITH Peter AS

    (SELECT EmployeeID, Freight,

    ROW_NUMBER() OVER(PARTITION BY EmployeeID ORDER BY Freight) AS [RecordNumber]

    FROM Orders)

    SELECT DISTINCT p2.RecordNumber FROM Peter p1

    INNER JOIN Peter P2

    ON p1.Freight = (p2.Freight + 0.01)

    Jan

  • Jan,

    Thank you for your efforts. In this example, yes, the next value would be in an increment of .01 because that would be the next consecutive number. The RecordNumber field is not necessary. It was only used to make it convenient for the reader to easily identify rows 5 and 14 without having to count down the rows on the screen.

    Your response only gives the RecordNumber. I swapped out p2.RecordNumber for p2.Freight and received numbers that were not the first number in a series. The first output was .15. When running the original query to then check, I did not see .16 for any EmployeeID. I also need the result set grouped by the EmployeedID for each series and the count of each consecutive numbers in the series. Since there may be more than one series per EmployeeID, I have not had success using GROUP BY since that groups the entire result set for each EmployeeID. That's one of the challenges I am facing. The other challenges are to just find numbers in a sequence and return the first record in the set. Lastly to get a count of the consecutive numbers in each set.

    David

  • use

    tempdb;

    CREATE

    TABLE test( EmployeeId int, Freight decimal(9,2))

    -------------- -------- ------------------

    INSERT

    test(EmployeeID, Freight) VALUES (1,0.21)

    INSERT

    test(EmployeeID, Freight) VALUES (1,0.45)

    INSERT

    test(EmployeeID, Freight) VALUES (1,0.93)

    INSERT

    test(EmployeeID, Freight) VALUES (1,1.27)

    INSERT

    test(EmployeeID, Freight) VALUES (1,1.35)

    INSERT

    test(EmployeeID, Freight) VALUES (1,1.36)

    INSERT

    test(EmployeeID, Freight) VALUES (1,1.51)

    INSERT

    test(EmployeeID, Freight) VALUES (1,1.66)

    INSERT

    test(EmployeeID, Freight) VALUES (1,2.50)

    INSERT

    test(EmployeeID, Freight) VALUES (1,3.94)

    INSERT

    test(EmployeeID, Freight) VALUES (1,4.03)

    INSERT

    test(EmployeeID, Freight) VALUES (1,4.27)

    INSERT

    test(EmployeeID, Freight) VALUES (1,4.41)

    INSERT

    test(EmployeeID, Freight) VALUES (1,4.98)

    INSERT

    test(EmployeeID, Freight) VALUES (1,4.99)

    INSERT

    test(EmployeeID, Freight) VALUES (1,4.99)

    INSERT

    test(EmployeeID, Freight) VALUES (1,7.46)

    -- Add an employee 2

    INSERT

    test(EmployeeID, Freight) VALUES (2,4.41)

    INSERT

    test(EmployeeID, Freight) VALUES (2,4.98)

    INSERT

    test(EmployeeID, Freight) VALUES (2,4.99)

    INSERT

    test(EmployeeID, Freight) VALUES (2,4.99)

    INSERT

    test(EmployeeID, Freight) VALUES (2,7.46)

    -- Query

    SELECT

    DISTINCT p2.EmployeeID, p2.Freight FROM Test p1

    INNER JOIN Test P2

    ON

    p1.Freight = (p2.Freight + 0.01)

    ORDER BY 1

    -- Result: the result seems to be ok to me:

    EmployeeID Freight

    1 1.35

    1 4.98

    2 4.98

    I would only use a group by if you use an aggregate function. Surely you're not summing or adding id's or anthing.

    Jan

  • Jan,

    Thanks again for another suggestion. However, this does not provide the count of the number of freight records contained in a sequence. It also shows a duplicate. I could easily create a test table with a SELECT INTO statement; however, it would be to cumbersome to handpick records to employee2. There are 830 records in the Orders table.

    The data from Northwind is used only as an example so others would have a data set to test with.

  • David,

    My solution did not show a duplicate. I just added a second employee because you were mentioning that you needed a group by employeeId; but in your test data there was only one EmployeeId. I needed it to test if the query is valid.

    Indeed, I overlooked the Count, which is also the reason why I wondered in my last post why you needed the GROUP BY.

    Anyway, it is easily added to the solution:

    SELECT r.EmployeeID, r.Freight, COUNT(r.Freight) as CountFreight

    FROM

     (SELECT p2.EmployeeID, p2.Freight

      FROM test p1

       INNER JOIN test P2

        ON p1.EmployeeID = p2.EmployeeID

         AND p1.Freight = (p2.Freight + 0.01)) r

    GROUP BY r.EmployeeID, r.Freight

    Mind that in the testcase you provided there where doubles in the data: freight 4.99. So thisis the count of 2 successors that are provided in the query. I hope I understood this time correctly what you want.

    I don't immedeately see where the SELECT INTO would help. Anyway this would involve more IO; the query here doesn't explicitely create temporary data.

    Hope we are getting nearer.

    Jan

  • SOLUTION

    SELECT

    EmployeeID

    ,LowFreight

    ,(

    SELECT count(*) FROM Orders PS

    WHERE PS.EmployeeID = T.EmployeeID AND (PS.Freight BETWEEN T.LowFreight AND T.HighFreight)

    ) as [SeriesRowCount]

    FROM

    (

    SELECT EmployeeID, Freight as LowFreight

    ,(

    SELECT min(Freight) FROM Orders PS3

    WHERE EXISTS

    (

    SELECT * FROM Orders PS2

    WHERE ( PS2.EmployeeID = PS3.EmployeeID AND PS2.Freight = (PS3.Freight - .01) )

    )

    AND NOT EXISTS

    (

    SELECT * FROM Orders PS2

    WHERE ( PS2.EmployeeID = PS3.EmployeeID AND PS2.Freight = (PS3.Freight + .01) )

    )

    AND PS3.EmployeeID = PS1.EmployeeID

    AND PS3.Freight > PS1.Freight

    ) as HighFreight

    FROM Orders PS1

    WHERE EXISTS

    (

    SELECT * FROM Orders PS2

    WHERE ( PS2.EmployeeID = PS1.EmployeeID AND PS2.Freight = (PS1.Freight + .01) )

    )

    AND NOT EXISTS

    (

    SELECT * FROM Orders PS2

    WHERE ( PS2.EmployeeID = PS1.EmployeeID AND PS2.Freight = (PS1.Freight - .01) )

    )

    ) T

    ORDER BY EmployeeID, LowFreight

  • David,

    Seems a bit complex in light of Jan's second solution.  What was wrong with it?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff,

    Jan's solution did not work.  I appreciate her effort.

  • Hi Jeff,

    Sure my code works! I never post anything without testing it first. It is of course possible that we misunderstand what is needed, but even that is not the case here. I was unaware that these were samples of the Northwind database; that's why I created my test table. In the following query I only changed the table name from "test" into "Orders" in order to run it in Northwind and added the ORDER BY EmployeeID, Freight in order to compare it with Davids query; the rest is exactly as I posted it before:

    SELECT r.EmployeeID, r.Freight, COUNT(r.Freight) as CountFreight

    FROM

     (SELECT p2.EmployeeID, p2.Freight

      FROM Orders p1

       INNER JOIN Orders P2

        ON p1.EmployeeID = p2.EmployeeID

         AND p1.Freight = (p2.Freight + 0.01)) r

    GROUP BY r.EmployeeID, r.Freight

    ORDER BY EmployeeID, Freight

    Running this, we see that the only difference in result I have with David is that my CountFreight is always one less then Davids. This is how I understood the question: I just counted the "followers". If David wants to count one more, fine, all we need to add is "+1" and we get exactly the same result as David, but with much simpler code and a much better query plan than his. I invite everyone who has a SQL and Northwind running to compare both solutions and their query plan.

    SELECT r.EmployeeID, r.Freight, COUNT(r.Freight) + 1 as CountFreight

    FROM

     (SELECT p2.EmployeeID, p2.Freight

      FROM Orders p1

       INNER JOIN Orders P2

        ON p1.EmployeeID = p2.EmployeeID

         AND p1.Freight = (p2.Freight + 0.01)) r

    GROUP BY r.EmployeeID, r.Freight

    ORDER BY EmployeeID, Freight

    And I still don't think it is a good idea to post exactly the same question in 2 threads.

    Regards,

    Jan

  • Jan,

    I appreciate your effort and glowing positive attitude; however, your suggestion does not work. This is apparent without the need of examining the data in Northwind. Here is your result set:

    EmployeeID    Freight    CountFreight

    1                    1.35       1

    1                    4.98       3

    4                    1.26       1

    4                    3.04       1

    4                    29.6       1

    4                    59.13     1

    7                    41.89     1

    8                    4.41       1

    1) The result is supposed to, in part, obtain the count for the number of times a set of consecutive numbers appear. If the count is 1, then there are no consecutive numbers.

    2) You state you did not know this question was posted with data from the Northwind database. "Northwind" was stated twice in the first posting and repeated on my 2nd reply.

    3) Your suggestion involved adding non-existent data.

    4) The question is not posted in two threads. It is posted in two different forums.

    I welcome a shorter suggestion; but need a solution.

  • Jeff and all others,

    I have helped a lot of people here, but this game is becoming too anoying. Why should I continue this child play?

    1) Would you be so kind to READ what I said? I fist understood that only the followers needed counting; so that's why I gave you the version with +1 which gives EXACTLY the same result as yours. Is putting this +1 hard to imagine?

    SELECT r.EmployeeID, r.Freight, COUNT(r.Freight) + 1 as CountFreight

    FROM

     (SELECT p2.EmployeeID, p2.Freight

      FROM Orders p1

       INNER JOIN Orders P2

        ON p1.EmployeeID = p2.EmployeeID

         AND p1.Freight = (p2.Freight + 0.01)) r

    GROUP BY r.EmployeeID, r.Freight

    ORDER BY EmployeeID, Freight

    Result:

    1      1,35      2

    1      4,98      3

    4      1,26      2

    4      3,04      2

    4      29,60    2

    4      59,13    2

    7      41,89    2

    8      4,41      2

    2) Ok, I overlooked this and I used your "partial result set" as data in a test table. My test table took exactly the same structure so it didn't even matter. Was it so hard to imangine that the name "test" needed replacing by "orders"?

    3) My suggestion did not involve adding non-existing data. It was a TEST. I wanted to add duplicates and employees to group by, exactly in the way that they exist in Northwind. Of course you don't NEED this data, it works with any data. Proof of this: run this query in Northwind and you get EXACTLY the same as yours.

    4) Clearly in 2 forums means in 2 threads as well. The point is that we, who are trying to help others, see the same question over and over again. And we don't see the suggestions already offered before.

    I have helped a lot of people here. I like problems because they are intelectually stimulating to look into. I do very much welcome somebody improving my code and can accept that I make errors (you can check my posting history on that). But this is an objective thing, it can be run by everyone. I see 272 views, so lots of people are spending time reading this nonsence. I invite everyone to run Davids query and mine (and yes, please, the version with the "+1" as posted here) the will get the EXACT SAME result with less code and a SIMPLER query plan.

    Jan

  • Jan,

    As angry as you are I still appreciate your efforts; however, your suggestion still does not work.

    The data used for this question was Northwind because it is a database we all have access to. A solution must work fundamentally, not just for Northwind. The solution must work whether there are two, three, or a hundred consecutive numbers. It turns out that the Orders table has at most three consecutive numbers according to the criteria in the original posting. Although, your suggestion with a +1 to the count works for Orders, it fails to work fundamentally. To prove this, simply insert more consecutive numbers:

    INSERT INTO Orders (EmployeeID, Freight) VALUES (1, 1.37)

    INSERT INTO Orders (EmployeeID, Freight) VALUES (1, 1.38)

    Now look at your result set.

  • Ok, I was wrong.

    Jan

  • Someone could probably do this in SQL Server 2005 using row numbers and correlated subqueries... but I don't have access to 2005 and I hate correlated subqueries just 'cause...

    First... Here's the test data I used... notice that I've added a dupe (4.99 like in the original test data and another "multi-dupe at 9.01) and a couple of places where there are more than 2 items that qualify for David's request... I also did like Jan did... I included more than 1 EmployeeID so that we can see that this works across multiple employee's without additional code... to keep everyone safe, I did this in a temp table... please read all comments in all code that follows...

    --=======================================================================================

    --      Create some test data.  This is NOT part of the solution... it's just to give us

    --      all something common to work with for testing.

    --=======================================================================================

    --===== If the table that holds the test data already exists drop it

         IF OBJECT_ID('TempDB..#MyHead') IS NOT NULL

            DROP TABLE #MyHead

    --===== Recreate the test table and populate it with test data

     CREATE TABLE #MyHead (EmployeeID INT,Freight DECIMAL(6,2))

     INSERT INTO #MyHead (EmployeeID,Freight)

     SELECT 1,0.21 UNION ALL

     SELECT 1,0.45 UNION ALL

     SELECT 1,0.93 UNION ALL

     SELECT 1,1.27 UNION ALL

     SELECT 1,1.35 UNION ALL --should be a "hit" with count of "2"

     SELECT 1,1.36 UNION ALL

     SELECT 1,1.51 UNION ALL

     SELECT 1,1.66 UNION ALL

     SELECT 1,2.50 UNION ALL

     SELECT 1,3.94 UNION ALL

     SELECT 1,4.03 UNION ALL

     SELECT 1,4.27 UNION ALL

     SELECT 1,4.41 UNION ALL

     SELECT 1,4.98 UNION ALL --should be a "hit" with count of "2"

     SELECT 1,4.99 UNION ALL

     SELECT 1,4.99 UNION ALL --dupe (not included according to David's specs)

     SELECT 1,7.46 UNION ALL

     SELECT 1,8.00 UNION ALL --should be a "hit" with count of "2"

     SELECT 1,8.01 UNION ALL

     SELECT 1,9.01 UNION ALL --should be a "hit" with count of "4"

     SELECT 1,9.02 UNION ALL

     SELECT 1,9.03 UNION ALL

     SELECT 1,9.04 UNION ALL

     SELECT 2,4.27 UNION ALL --different employee id

     SELECT 2,4.41 UNION ALL

     SELECT 2,4.98 UNION ALL --should be a "hit" with count of "3"

     SELECT 2,4.99 UNION ALL

     SELECT 2,4.99 UNION ALL --dupe (not included according to David's specs)

     SELECT 2,5.00 UNION ALL

     SELECT 2,7.46 UNION ALL

     SELECT 2,8.00 UNION ALL --should be a "hit" with count of "3"

     SELECT 2,8.01 UNION ALL

     SELECT 2,8.02 UNION ALL

     SELECT 2,9.01 UNION ALL --should be a "hit" with count of "4"

     SELECT 2,9.02 UNION ALL

     SELECT 2,9.02 UNION ALL --dupe (not included according to David's specs)

     SELECT 2,9.02 UNION ALL --dupe (not included according to David's specs)

     SELECT 2,9.02 UNION ALL --dupe (not included according to David's specs)

     SELECT 2,9.03 UNION ALL

     SELECT 2,9.04 UNION ALL

     SELECT 2,9.07 UNION ALL

     SELECT 3,9.07 UNION ALL --just to show one EmployeeID won't bleed into the next

     SELECT 3,9.08

    Ok... because I find it handy to "peel on potato at a time" (also known as "Divide and Conquer"), let's first isolate the data that qualifies for David's problem so we don't have to worry about rows that don't ...

    --=======================================================================================

    --      First, let's start by isolating only the rows in the data the qualify for David's

    --      problem... those are rows that are .01 apart in the Freight column for any given

    --      customer... to "play" this against the Orders table (or any other table), just

    --      change #MyHead to the correct table name.

    --=======================================================================================

    --===== If the table that holds the isolated data already exists drop it

         IF OBJECT_ID('TempDB..#Work') IS NOT NULL

            DROP TABLE #Work

    --===== Create and populate the work table with a clustered primary key and a special

         -- column to support some "trick" code later on

     CREATE TABLE #Work

            (

            EmployeeID INT NOT NULL,

            Freight    DECIMAL(6,2) NOT NULL,

            MyGroup    INT,  --This is the "special" column

            PRIMARY KEY CLUSTERED (EmployeeID,Freight)

            )

     INSERT INTO #Work (EmployeeID,Freight)

     SELECT DISTINCT

            t1.EmployeeID,t1.Freight

       FROM #MyHead t1,

            #MyHead t2

      WHERE (t1.Freight+.01 = t2.Freight)

        AND t1.EmployeeID   = t2.EmployeeID

      UNION ----------------------------------------------

     SELECT DISTINCT

            t1.EmployeeID,t1.Freight

       FROM #MyHead t1,

            #MyHead t2

      WHERE (t1.Freight-.01 = t2.Freight)

        AND t1.EmployeeID   = t2.EmployeeID

    --===== Here's what's in the work table right now

     SELECT * FROM #WORK

    ... A bit more potato peeling... let's break contiguous/sequential rows (.01 apart) into groups ...

    --=======================================================================================

    --      This bit of "trick" code relies on the order of the data which is

    --      forced by the Clustered Primary Key

    =======================================================================================

    --===== Declare a couple of local variables to support the "trick" code

    DECLARE @MyGroup INT

        SET @MyGroup = 0

    DECLARE @LastFreight DECIMAL(6,2)

        SET @LastFreight = -1

    --===== Here's the "trick"... this "groups" rows that are .01 apart in Freight using the

         -- proprietary UPDATE clause of SQL Server... magic... and blows cursor speed away!!!

     UPDATE w

        SET @MyGroup = MyGroup = CASE

                                   WHEN Freight = @LastFreight + .01

                                   THEN @MyGroup

                                   ELSE @MyGroup+1

                                 END,

            @LastFreight = Freight

       FROM #Work w --Order of processing is by the Clustered Primary key

    --===== Here's what's in the work table right now... see where we're going with this????

     SELECT * FROM #WORK

    ... and, finally, let's dump the peeled potato's into a basket...

    --=======================================================================================

    --      So, with that having been done, it's now easy to get the results you want.

    --=======================================================================================

    SELECT EmployeeID, MIN(Freight) AS Freight,COUNT(*) AS Items

      FROM #Work

     GROUP BY EmployeeID,MyGroup

    Like I said, there's lot's of folks that could probably do this with a nice correlated sub-query and a handful of self joins... and in 2k5, undoubtably some form of CTE with an "Over Partion" would probably play nicely... but I've found that most correlated sub-queries of the nature necessary to support this solution end up having slower triangular joins and I just don't have 2k5 to experiment with.

    David... let me know if that does it for you, please...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 45 total)

You must be logged in to reply to this topic. Login to reply