Identifying NEW/RETAIN/LOST unique_id

  • Hello,

    I have a data set of unique_id by eomonth of everymonth since jul 2015 with a field aggregating whether a product existed in the respective month.  In other words, client ABC that has product DEF for 12 months between 07/31/2017-6/30/2018 will be marked as 1 in those specific months and 0 in all other.  With your help, I am looking to take this further.  I am hoping to ID the first time a product was acquired with NEW and each of the months the account was under contract as RETAIN and the first month it was out of contract as LOST.  I've attached a data sample.  Your help is greatly appreciated.

    Thank you

    -Alex

  • Most people are hesitant to open up documents posted by random strangers.  Please provide sample data as outlined in the first link in my signature.  You also haven't mentioned how you want to handle clients that were lost, but then started back up again.  You'll probably end up using LEAD/LAG for your analysis.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • valid point.  Sorry about that.  Please see below.  

    CREATE TABLE #mytable
       (
       ID integer,
       "2017-06-30" varchar,
       "2017-07-31" varchar,
       "2017-08-31" varchar,
       "2017-09-30" varchar
       )

    INSERT INTO #mytable (ID,"2017-06-30","2017-07-31","2017-08-31","2017-09-30") VALUES (100, 0, 1, 1, 0)
    INSERT INTO #mytable (ID,"2017-06-30","2017-07-31","2017-08-31","2017-09-30") VALUES (101, 0, 0, 0, 0)
    INSERT INTO #mytable (ID,"2017-06-30","2017-07-31","2017-08-31","2017-09-30") VALUES (102, 1, 1, 1, 0)
    INSERT INTO #mytable (ID,"2017-06-30","2017-07-31","2017-08-31","2017-09-30") VALUES (103, 1, 1, 1, 1)
    INSERT INTO #mytable (ID,"2017-06-30","2017-07-31","2017-08-31","2017-09-30") VALUES (104, 0, 1, 1, 1)

  • alexander.lummer - Thursday, September 13, 2018 2:52 PM

    valid point.  Sorry about that.  Please see below.  

    CREATE TABLE #mytable
       (
       ID integer,
       "2017-06-30" varchar,
       "2017-07-31" varchar,
       "2017-08-31" varchar,
       "2017-09-30" varchar
       )

    INSERT INTO #mytable (ID,"2017-06-30","2017-07-31","2017-08-31","2017-09-30") VALUES (100, 0, 1, 1, 0)
    INSERT INTO #mytable (ID,"2017-06-30","2017-07-31","2017-08-31","2017-09-30") VALUES (101, 0, 0, 0, 0)
    INSERT INTO #mytable (ID,"2017-06-30","2017-07-31","2017-08-31","2017-09-30") VALUES (102, 1, 1, 1, 0)
    INSERT INTO #mytable (ID,"2017-06-30","2017-07-31","2017-08-31","2017-09-30") VALUES (103, 1, 1, 1, 1)
    INSERT INTO #mytable (ID,"2017-06-30","2017-07-31","2017-08-31","2017-09-30") VALUES (104, 0, 1, 1, 1)

    Oh boy....   Hate to have to burst anyone's bubble, but this kind of table design is a serious problem.   Any time you want to represent a specific month of data as a column in a table, you effectively tie yourself to having to add more columns in perpetuity, because time never stops passing.   Eventually, you have to start deleting columns, and things at that point are surely already messy, and you have one heck of a "hot mess" on your hands.  This table also has absolutely nothing in it to tell you what that data represents.  Total obfuscation is also usually a bad idea, if for no other reason than "what if you get hit by a bus and no one else knows this table?"   Querying this might be easy, but only for a limited time frame.   Business processes should not have to change with the mere passage of time, and you are ensuring that your company will continue to have to "cost itself" time and money to fix that table every so often.   How about doing it right out of the gate?   Try this:
    CREATE TABLE #mytable (
        CustomerID int NOT NULL,
        MonthDate date NOT NULL,
        CustomerStatus bit NOT NULL,
        UNIQUE CLUSTERED
            (
            CustomerID ASC,
            MonthDate ASC
            )
    );

    This way, you never have to add columns.   You only ever add rows.  New data.   Just the way it's supposed to be.   Queries that span months but need horizontal presentation can either be matrix driven in SSRS or you can  use a CrossTab query.   Let the presentation layer handle that kind of difficulty.   Believe me, this would be a far better design in the long run.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • I have a dynamic process that adds months and assess whether the account/product combination unique id is NEW, RETAINED from the previous month, or lost.  The table i have does not distinguish between the three.  The current process just simple identifies whether it exists in the month.  With your help, i will be able to take it further and identify when the product was added as a new product for that specific account and it was considered retained throughout its contractual life cycle and when it was lost.

  • alexander.lummer - Friday, September 14, 2018 9:36 AM

    I have a dynamic process that adds months and assess whether the account/product combination unique id is NEW, RETAINED from the previous month, or lost.  The table i have does not distinguish between the three.  The current process just simple identifies whether it exists in the month.  With your help, i will be able to take it further and identify when the product was added as a new product for that specific account and it was considered retained throughout its contractual life cycle and when it was lost.

    I figured that's what you were doing, but I suspect that you want to evaluate this earlier in your process than the data you have presented here.  Specifically, it looks like you have pivoted the data, and this calculation is much easier to do before you pivot rather than after.  Can you provide us sample data from the beginning of the process rather than at some midpoint?  Also, you should provide expected results.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Can i attach an excel file? will make it easier

    When its all said and done, I need to be able to indentify when a client added an account (ie NEW) when they decided not to renew (ie LOST) along with information on the specific client and the dollar amount of the deal.  NOTE: renewed accounts are not taken in consideration.  In other words, I want to isolate new added products for either new or existing clients.

    The process i shared with you is the mid-point as you mentioned.  I thought my approach would make it easier to isolate the months products were added and lost.  An suggestion is greatly appreciated.

  • alexander.lummer - Friday, September 14, 2018 12:11 PM

    Can i attach an excel file? will make it easier

    Yes, you can attach an Excel file.  It will not make it easier.  While it makes it easier for you, it makes it harder for anyone else.  It's much easier for one person to put in the effort to format it in such a way that the data is consumable than it is for everybody else to format that same data in the same way.  Furthermore, Excel is known for mangling data types, so we have to guess what the appropriate data types are when you provide an Excel file.  Also, I will not open an Excel file posted by some random stranger.  If you don't care enough about the problem to put in the effort to make it easier for other folks, then I don't see why I should care enough about YOUR problem to help you.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Drew I appreciate your feedback.  Please let me know if I articulated my goal.  if so, can you please suggest a better way of getting to the end result.  

    How would you prefer I provide you with a sample data?

    Thanks

    Alex

  • alexander.lummer - Friday, September 14, 2018 1:52 PM

    How would you prefer I provide you with a sample data?

    Please see my very first response in this thread.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Try...

    if OBJECT_ID('tempdb..#temp') is not null drop table #temp;

    if OBJECT_ID('tempdb..#cte') is not null drop table #cte;

    create table #temp
    (
    [unique_id] varchar(50) primary key,
    [06/30/2015] bit,
    [07/31/2015] bit,
    [08/31/2015] bit,
    [09/30/2015] bit,
    [10/31/2015] bit,
    [11/30/2015] bit,
    [12/31/2015] bit,
    [01/31/2016] bit,
    [02/29/2016] bit,
    [03/31/2016] bit,
    [04/30/2016] bit,
    [05/31/2016] bit,
    [06/30/2016] bit,
    [07/31/2016] bit,
    [08/31/2016] bit,
    [09/30/2016] bit,
    [10/31/2016] bit,
    [11/30/2016] bit,
    [12/31/2016] bit,
    [01/31/2017] bit,
    [02/28/2017] bit,
    [03/31/2017] bit,
    [04/30/2017] bit,
    [05/31/2017] bit,
    [06/30/2017] bit,
    [07/31/2017] bit,
    [08/31/2017] bit,
    [09/30/2017] bit,
    [10/31/2017] bit,
    [11/30/2017] bit,
    [12/31/2017] bit,
    [01/31/2018] bit,
    [02/28/2018] bit,
    [03/31/2018] bit,
    [04/30/2018] bit,
    [05/31/2018] bit,
    [06/30/2018] bit,
    [07/31/2018] bit,
    [08/31/2018] bit,
    [09/30/2018] bit
    );

    insert into #temp values ('1000149-4011201-AD','0','0','0','0','0','0','0','0','0','0','0','0','1','1','1','1','1','1','1','1','1','1','1','1','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0');
    insert into #temp values ('1000149-4011202-C','0','0','0','0','0','0','0','0','0','0','0','1','1','1','1','1','1','1','1','1','1','1','1','1','1','2','1','1','1','1','1','1','1','1','1','1','1','1','1','1');
    insert into #temp values ('1000189-4011201-AD','0','0','0','0','0','0','0','0','1','1','1','1','1','1','1','1','1','1','1','1','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0');
    insert into #temp values ('1000204-4011201-AD','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1');
    insert into #temp values ('1000204-4011203-I','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0');
    insert into #temp values ('1000204-4220111','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0');
    insert into #temp values ('1000204-4220311-2','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0');
    insert into #temp values ('1002573-4220311-2','0','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1');
    insert into #temp values ('1002573-4220311-20','1','1','1','1','1','1','1','0','0','0','0','0','0','0','0','0','0','0','0','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1');
    insert into #temp values ('1002573-4220311-32','1','2','2','2','2','2','2','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1');
    insert into #temp values ('1002573-4220311-36','1','2','2','2','2','2','2','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1');
    insert into #temp values ('1002573-4220311-37','1','2','2','2','2','2','2','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1');
    insert into #temp values ('1002573-4220311-38','1','2','2','2','2','2','2','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1');
    insert into #temp values ('1002573-4220312-1','1','2','2','2','2','2','2','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1');
    insert into #temp values ('1002603-4011202-C','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1');
    insert into #temp values ('1002647-4011201-AD','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','0','0','0','0','0','0','0','0','0','0','0','0','0');
    insert into #temp values ('1002647-4055000-1','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','1','1','1','1','1','1','1','1','1','1','1','1','0','0','0','0','0','0','0','0','0','0','0','0','0');
    insert into #temp values ('1002647-4190200-A','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','1','0','0','0','0','0','0','0','0','0','0','0','0','0','0');
    insert into #temp values ('1002647-4220111-88','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','1','1','1','1','1','1','1','1','1','1','1','1','0','0','0','0','0','0','0','0','0','0','0','0','0');
    insert into #temp values ('1002715-17-100004','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','1','1','1','1','1','1','1','1','1','1','1','1','1');
    insert into #temp values ('1002715-4011203-I','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1');
    insert into #temp values ('1002715-4055000-1','0','0','0','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1');
    insert into #temp values ('1002715-4220311-2','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1');
    insert into #temp values ('1002715-4220311-20','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1');
    insert into #temp values ('1002715-4220311-36','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1');
    insert into #temp values ('1002715-4220311-37','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1');
    insert into #temp values ('1002715-4220311-38','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1');
    insert into #temp values ('1002715-4220312-1','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1');
    insert into #temp values ('1002759-4011503-B','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1');
    insert into #temp values ('1024788-4220111-88','1','1','1','1','1','1','1','1','1','1','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0');

    create table #cte
    (
    unique_id varchar(50),
    [rn] int,
    [dates] date,
    [vals] bit
    primary key(unique_id asc, [rn] asc)
    );

    CREATE NONCLUSTERED INDEX #CTE_ID_INCL_VALS ON #cte(unique_id ASC) INCLUDE ([vals]);

    with cte
    as
    (

    select
    p.unique_id
    ,p.dates
    ,p.vals

    from #temp a
    unpivot
        (
         vals
         for dates in
         (    [06/30/2015],    [07/31/2015],    [08/31/2015],    [09/30/2015],    [10/31/2015],    [11/30/2015],    [12/31/2015],    [01/31/2016],    [02/29/2016],    [03/31/2016],    [04/30/2016],    [05/31/2016],    [06/30/2016],    [07/31/2016],    [08/31/2016],    [09/30/2016],    [10/31/2016],    [11/30/2016],    [12/31/2016],    [01/31/2017],    [02/28/2017],    [03/31/2017],    [04/30/2017],    [05/31/2017],    [06/30/2017],    [07/31/2017],    [08/31/2017],    [09/30/2017],    [10/31/2017],    [11/30/2017],    [12/31/2017],    [01/31/2018],    [02/28/2018],    [03/31/2018],    [04/30/2018],    [05/31/2018],    [06/30/2018],    [07/31/2018],    [08/31/2018],    [09/30/2018])
        
         ) p
    )

    insert #cte
    select a.unique_id
    ,ROW_NUMBER() over(PARTITION by a.unique_id order by cast(a.dates as date) asc) [rn]
    ,a.dates
    ,a.vals

    from cte a ;

    go

    ;with cte
    as
    (
    select a.unique_id
    ,a.dates
    ,
    (
    case
    when a.vals = 0 and  isnull(b.vals ,0) = 0
    then cast(a.vals as varchar(6))
    when a.vals = 0 and  isnull(b.vals ,0) = 1
    then 'Loss'
    when a.vals = 1 and  isnull(b.vals ,0) = 0
    then 'NEW'
    when a.vals = 1 and  isnull(b.vals ,0) = 1
    then 'Retain'
    else cast(a.vals as varchar(6))
    end
    ) [status]

    from

    #cte a
    outer apply
        (
        select b.vals
        from #cte b
        where b.unique_id = a.unique_id
        and b.rn = a.rn - 1
        ) b
    )

    select *
    from cte a
    pivot
    (
    max(status)
    for dates in
    (
    [06/30/2015],    [07/31/2015],    [08/31/2015],    [09/30/2015],    [10/31/2015],    [11/30/2015],    [12/31/2015],    [01/31/2016],    [02/29/2016],    [03/31/2016],    [04/30/2016],    [05/31/2016],    [06/30/2016],    [07/31/2016],    [08/31/2016],    [09/30/2016],    [10/31/2016],    [11/30/2016],    [12/31/2016],    [01/31/2017],    [02/28/2017],    [03/31/2017],    [04/30/2017],    [05/31/2017],    [06/30/2017],    [07/31/2017],    [08/31/2017],    [09/30/2017],    [10/31/2017],    [11/30/2017],    [12/31/2017],    [01/31/2018],    [02/28/2018],    [03/31/2018],    [04/30/2018],    [05/31/2018],    [06/30/2018],    [07/31/2018],    [08/31/2018],    [09/30/2018]
    )
    ) p

    go

    drop table #temp;
    drop table #cte;

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply