Add row number column based on id and stages

  • Hello Everyone

    I have a table with customer ids and different stages a customer can be in starting with A , B , C and D each starting at different dates

    Now a customer can be in multiple stages and I want to group them as below example (column rownum)

    Basically this determines the batch a customer will be in - each batch will have these four or less stages

    it is okay to assume that there will be no overlap between the two batches

    Thanks

    Please see the example

    DROP TABLE #t 

    CREATE TABLE #t
    ( customerid INT ,
    stage VARCHAR(10),
    startdate DATE,
    rownum int)

    INSERT INTO #t
    SELECT 1 ,'A' , '20170101' , 1 UNION ALL
    SELECT 1 ,'A' , '20180101' ,2 UNION ALL
    SELECT 1 ,'B' , '20180201' , 2 UNION ALL
    SELECT 1 ,'C ' , '20190101', 2 UNION ALL
    SELECT 2 ,'A ' , '20170101' ,1 UNION ALL
    SELECT 2 ,'B' , '20170201' , 1 UNION ALL
    SELECT 3 ,'A' , '20170101', 1 UNION ALL
    SELECT 3 ,'B' , '20170201' ,1 UNION ALL
    SELECT 3 ,'A' , '20180101' , 2 UNION ALL
    SELECT 3 ,'B' , '20180201', 2 UNION ALL
    SELECT 3 ,'C' , '20180301' , 2

    SELECT * FROM #t

    • This topic was modified 1 year, 3 months ago by  @Taps.
  • Be good to see what you mean by grouping. Show some results. FWIW, this is why testing is a good way to get set up for queries: https://www.sqlservercentral.com/articles/using-tsqlt-tests-to-practice-queries

  • I don't understand the logic for setting rownum. Can you explain it, please?

    If the answer to your question can be found with a brief Google search, please perform the search yourself, rather than expecting one of the SSC members to do it for you.
    See https://www.sqlservercentral.com/articles/forum-etiquette-how-to-post-datacode-on-a-forum-to-get-the-best-help/ for details of how to post T-SQL code-related questions.

  • I did post a reply, but it didn't save it

    i'm guessing that you want something like this

    SELECT *,ROW_NUMBER() OVER(PARTITION BY customerid ORDER BY stage) AS rnk FROM #t

    but you have to this at query time, as you cannot use the windowing functions in a computed column

    MVDBA

  • Very sorry guys  I just realized I made a mistake in the output - I have edited the example in my question above

    The 'rownum' column is just to show the result that I want

    Yes MVDBA I am looking for something that will use row number function but not sure  how  it will give the desired output

    So the logic will be

    A customer progresses to four stages starting from A to D - that makes one group

    Now same customer can  just stop at stage A or B and later can restart from the stage A - this will make the second group

    So for customer 1 - he started stage A at 20170101 that will have a row number value 1 and then we have a another row with stage A  - this means its a restart of the process - so that will have the rownum value of 2. Stage B , C that followed A with startdate of 2018-01-01 will also have the value 2 - because they belong to same group

    Hopefully Phil this will make it clear  - that I have now amended the example

    Hopefully Phil now that I have amended the example it will be clear .

    Steve I haven't looked at the link yet but I will have a look at it

    Thanks everyone!

  • I would urge you to product a real result set, which can be with inserts into another table. This will help you be sure you've correctly identified things and make it easy for any of us to test a query against. Trying to describe all these cases can get confusing and leave things open to mistakes.

  • ;WITH cte AS
    (
    SELECT x.customerid,
    x.stage,
    x.startdate,
    CASE WHEN Stage <= lag(stage) OVER (PARTITION BY customerid ORDER BY startdate, stage) THEN 1
    ELSE 0
    END IncrementRowNum
    FROM #t x
    )
    SELECT x.customerid,
    x.stage,
    x.startdate,
    SUM(x.IncrementRowNum) OVER (PARTITION BY x.CustomerId ORDER BY x.startDate, x.stage)+1 RowNum
    FROM cte x
  • >> I have a table with customer ids and different something_stages a customer can be in starting with A, B, C and D each starting at different dates. <<

    Then we need some constraints to enforce this business rule don't we?

    >> Now a customer can be in multiple something_stages and I want to group them as below example (column rownum) <<

    Since tables have no ordering by definition, what do you mean by "grouping" them? It looks like you want a three-part primary key for the table. It also sounds like a customer might be in a stage for a duration, so we would need start and stop timestamps for that duration. Your sample data doesn't support this

    I'm afraid your sample data is a mess. We have no constraints, no key, and the datatypes don't make any sense. I'm also trying to figure out exactly what a row number is in terms of a logical data model; it sounds like a physical description of the storage used for the table on some hard disk on some machine.

    Identifiers should never be numeric because we don't do calculations on them. They use what is called a nominal scale. I'm also curious why you are using the old Sybase style table constructor instead of the ANSI standard that Microsoft is had for quite a few years now.

    CREATE TABLE Foobar --- needs a real name

    (customer_id CHAR(10) NOT NULL, -- nominal scale

    something_stage_name CHAR(1) NOT NULL

    CHECK(something_stage_name IN ('A', 'B', 'C', 'D')),

    something_stage_attempt_nbr INTEGER NOT NULL

    CHECK(something_stage_attempt_nbr > 0),

    something_start_date DATE DEFAULT CURRENT_TIMESTAMP NOT NULL,

    --- something_end_date DATE,

    PRIMARY KEY (customer_id, something_stage_name, something_stage_attempt_nbr)));

    INSERT INTO Foobar

    VALUES

    ('0001', 'A', '2017-01-01', 1),

    ('0001', 'A', '2018-01-01', 2),

    ('0001', 'B', '2018-02-01', 2),

    ('0001', 'C', '2019-01-01', 2),

    ('0002', 'A ', '2017-01-01', 1),

    ('0002', 'B', '2017-02-01', 1),

    ('0003', 'A', '2017-01-01', 1),

    ('0003', 'B', '2017-02-01', 1),

    ('0003', 'A', '2018-01-01', 2),

    ('0003', 'B', '2018-02-01', 2),

    ('0003', 'C', '2018-03-01', 2);

    This sounds like you might want to progress the customer from stage A, to B, to C and finish at D. Is it also true that the only attempt that matters is the last one? You might want to look at

    https://www.red-gate.com/simple-talk/sql/t-sql-programming/state-transition-constraints/

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • Thanks Jonathan - your script solves my problem!

    Joe - appreciate your inputs and your link about transition looks like something I can use in future

    The example I produced was just for this very specific purpose  - this is certainly not how the table is designed or the data looks like

    Thanks

Viewing 9 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply