need help with logic, solution is nearly there, but one more hump

  • I don't know how to solve for Rule 5.

    An example of correctness is: 1Z3V1F0359168993, with one tracking_number sharing the same ship_date. 

    Rule: 
    (1) if > 1 tracking_number has >1 ship_dates, then take the ship_date having earliest Date_submitted_to_carrier .
    (2) if > 1 tracking_number has >1 ship_dates and earliest Date_submitted_to_carrier are same, take the earliest ship_date.  
    (3) if claim type for same tracking number are CCNR and Late, use CCNR.  
    (4) if claim type for same tracking number are all PDMM, use PDMM
    And
    (5)   If ship_dates are more than 15 days apart for same tracking_number then they are different orders (tracking_numbers were reused) eg. 1Z90134200324274.   Only update those tracking_numbers having different ship_dates, if ship_dates are within 15 days of each other ( don't know how to do this )

    sample dataset (DDL below)

    Select 'tracking_Number', 'ship_date', 'claim_type', 'date_submitted_to_carrier' UNION ALL
    select '1Z90124202029501','2018-09-10 00:00:00.000','Late','2018-10-30 00:00:00.000' UNION ALL
    select '1Z90124202029501','2018-09-07 00:00:00.000','CCNR','2018-09-26 00:00:00.000' UNION ALL
    select '1Z90134200323640','2018-09-30 00:00:00.000', 'CCNR', '2018-10-17 00:00:00.000' UNION ALL
    select '1Z90134200324274','2018-01-30 00:00:00.000','CCNR', '2018-02-17 00:00:00.000' UNION ALL
    select '1Z90134200324274','2018-10-01 00:00:00.000','Late','2018-10-30 00:00:00.000' UNION ALL

    select '1Z3V1F0359146828','2017-12-31 00:00:00.000','PDMM','2018-01-15 00:00:00.000' UNION ALL
    select '1Z3V1F0359146828','2018-01-10 00:00:00.000','CCNR','2018-01-16 00:00:00.000' UNION ALL
    select '1Z3V1F0359168993','2018-01-03 00:00:00.000','CCNR','2018-01-13 00:00:00.000' UNION ALL
    select '1Z3V1F0359168993','2018-01-03 00:00:00.000','CCNR','2018-01-13 00:00:00.000' UNION ALL

    select '1Z3V1F0359168055', '2018-01-02 00:00:00.000','PDMM', '2018-01-13 00:00:00.000' UNION ALL
    select '1Z3V1F0359168055', '2018-01-03 00:00:00.000', 'PDMM', '2018-01-13 00:00:00.000' 

    DDL

    use testdb
    go
    create table Claims (tracking_Number varchar(30) not null, ship_date datetime, claim_type varchar(4), date_submitted_to_carrier datetime)



    INSERT INTO [TestDB].[dbo].[Claims] ([tracking_Number],[ship_date],[claim_type],[date_submitted_to_carrier])
    VALUES ('1Z90124202029501','2018-09-10 00:00:00.000','Late','2018-10-30 00:00:00.000'),
    ('1Z90124202029501','2018-09-07 00:00:00.000','CCNR','2018-09-26 00:00:00.000'),
    ('1Z90134200323640','2018-10-01 00:00:00.000','Late','2018-10-30 00:00:00.000'),
    ('1Z90134200323640','2018-09-30 00:00:00.000', 'CCNR', '2018-10-17 00:00:00.000'),
    ('1Z90134200324274','2018-01-30 00:00:00.000','CCNR', '2018-02-17 00:00:00.000'),
    ('1Z90134200324274','2018-10-01 00:00:00.000','Late','2018-10-30 00:00:00.000'),
    ('1Z3V1F0359146828','2017-12-31 00:00:00.000','PDMM','2018-01-15 00:00:00.000'),
    ('1Z3V1F0359146828','2018-01-10 00:00:00.000','CCNR','2018-01-16 00:00:00.000'),
    ('1Z3V1F0359168993','2018-01-03 00:00:00.000','CCNR','2018-01-13 00:00:00.000'),
    ('1Z3V1F0359168993','2018-01-03 00:00:00.000','CCNR','2018-01-13 00:00:00.000'),
    ('1Z3V1F0359168055', '2018-01-02 00:00:00.000','PDMM', '2018-01-13 00:00:00.000'),
    ('1Z3V1F0359168055', '2018-01-03 00:00:00.000', 'PDMM', '2018-01-13 00:00:00.000');

    My long solution

    IF OBJECT_ID('tempdb..#sc') IS NOT NULL DROP TABLE #sc ;
    --original data set put into temp table (using #sc for testing purposes, so don't have to recreate Claims table each time I test this)
    SELECT [tracking_Number]
    ,[ship_date]
    ,[claim_type]
    ,[date_submitted_to_carrier]
    INTO #sc
    FROM [TestDB].[dbo].[Claims]
    ORDER BY tracking_Number;

    IF OBJECT_ID('tempdb..#bad_shipdate') IS NOT NULL DROP TABLE #bad_shipdate ;
    --(1)identify tracking_numbers with more than one ship_date
    SELECT distinct c.tracking_number
    INTO #bad_shipdate
    FROM Claims c JOIN
    (select DISTINCT ship_date, Tracking_Number from Claims) s on c.tracking_number = s.tracking_number
    where c.ship_date <> s.ship_date AND claim_type IN ('late','ccnr')
    order by c.tracking_number;

    IF OBJECT_ID('tempdb..#fixed_shipdate') IS NOT NULL DROP TABLE #fixed_shipdate ;
    --(2) isolate ship_date corresponding to earliest Date_Submitted_To_Carrier
    SELECT DISTINCT c1.Tracking_Number, c1.Ship_Date
    INTO #fixed_shipdate
    FROM Claims c1 JOIN #bad_shipdate bad on c1.Tracking_Number = bad.Tracking_Number
    WHERE c1.Date_Submitted_to_Carrier = (select TOP 1 Date_Submitted_to_Carrier
    FROM Claims c2 where c1.Tracking_Number = c2.Tracking_Number
    ORDER BY Tracking_Number, Date_Submitted_to_Carrier ASC);

    IF OBJECT_ID('tempdb..#final_shipdate') IS NOT NULL DROP TABLE #final_shipdate;

    --(3) in case of matching date_submitted_to_carrier in prior step, take min ship_date

    SELECT tracking_number, min(ship_date) as ship_date
    INTO #final_shipdate
    FROM #fixed_shipdate
    group by tracking_Number;

    --(4) update the original dataset  

    UPDATE sc
    SET sc.claim_type = 'CCNR',
    sc.ship_date = f.ship_date
    FROM #final_shipdate f JOIN
    #sc sc ON f.Tracking_Number = sc.Tracking_Number
    AND sc.Claim_Type IN ('ccnr', 'late') ;

    --repeat steps 1-4 again for PDMM claim_type

    --Quote me

  • Why not use LAG() to get the previous ship date for a tracking number?  This is ugly, but I think it'll work:
    SELECT x.tracking_number
        , x.ship_date
        , x.PrevShipDate
        , DATEDIFF(day,x.PrevShipDate,x.ship_date) AS DaysBetween
    FROM (
            SELECT c.tracking_number
                , c.ship_date
                , LAG(c.ship_date,1) OVER (PARTITION BY tracking_number ORDER BY ship_date) AS PrevShipDate
            FROM Claims c
        ) x
    WHERE x.PrevShipDate IS NOT NULL;

  • pietlinden - Tuesday, December 4, 2018 12:06 AM

    Why not use LAG() to get the previous ship date for a tracking number?  This is ugly, but I think it'll work:
    SELECT x.tracking_number
        , x.ship_date
        , x.PrevShipDate
        , DATEDIFF(day,x.PrevShipDate,x.ship_date) AS DaysBetween
    FROM (
            SELECT c.tracking_number
                , c.ship_date
                , LAG(c.ship_date,1) OVER (PARTITION BY tracking_number ORDER BY ship_date) AS PrevShipDate
            FROM Claims c
        ) x
    WHERE x.PrevShipDate IS NOT NULL;

    no, because the difference between ship_dates for same tracking number can be up to 15 days apart. If more than 15 days apart it is indicator of possibly different order.  I updated DDL so that ship_date differences are not always shown as 1 day apart.

    I thought something like this, but I get error:

    select distinct c.tracking_number
    from Claims c join
    (select DISTINCT ship_date, Tracking_Number from Claims) s on c.tracking_number = s.tracking_number
    where c.ship_date <> s.ship_date 
    AND claim_type in ('late','ccnr')
    AND MAX(c.ship_date) >= DATEADD(day, -15, c.ship_date)
    AND MIN(c.ship_date) <= DATEADD(day, 15, c.ship_date)
    order by c.tracking_number;

    error
    Msg 147, Level 15, State 1, Line 4
    An aggregate may not appear in the WHERE clause unless it is in a subquery contained in a HAVING clause or a select list, and the column being aggregated is an outer reference.

    --Quote me

  • pietlinden,  I appreciate the help and intro to LAG. It would have worked if ship_dates were never more than one day apart, as I had presented my DDL.

    In the end I found that there was another column that I could use as a way to distinguish between two different use phases of the same tracking number, called order_id (which I did not have in the DDL).   

    I was then able to join to correlated subquery on both tracking_number and order_id to make sure I was identifying the ones with issue and also taking the min(ship_date) for the right set of tracking numbers:

    IF OBJECT_ID('tempdb..#bad_shipdate1') IS NOT NULL DROP TABLE #bad_shipdate1

    select distinct c.tracking_number, c.order_id
    into #bad_shipdate1
    from Claims c join
    (select DISTINCT ship_date, Tracking_Number, order_id from Claims) s on c.tracking_number = s.tracking_number and c.order_id = s.order_id
    where c.ship_date <> s.ship_date
    order by c.tracking_number;
    go

    select * from #bad_shipdate1;

    IF OBJECT_ID('tempdb..#good_shipdate1') IS NOT NULL DROP TABLE #good_shipdate1

    select DISTINCT c1.Tracking_Number, c1.Ship_Date, c1.order_id
    INTO #good_shipdate1
    FROM Claims c1 join #bad_shipdate1 bad on c1.Tracking_Number = bad.Tracking_Number and c1.order_id = bad.order_id
    WHERE c1.Date_Submitted_to_Carrier = (select TOP 1 Date_Submitted_to_Carrier
    FROM Claims c2 where c1.Tracking_Number = c2.Tracking_Number and c1.order_id = c2.order_id
    ORDER BY Tracking_Number, order_id, Date_Submitted_to_Carrier ASC);

    select * from #good_shipdate1

      IF OBJECT_ID('tempdb..#final_shipdate1') IS NOT NULL DROP TABLE #final_shipdate1
    Select tracking_number, order_id, min(ship_date) as ship_date
    into #final_shipdate1
    from #good_shipdate1
    group by tracking_Number, order_id;

    select * from #final_shipdate1

    Update sc
    set sc.ship_date = f.ship_date
    from #final_shipdate1 f join
        #sc sc on f.tracking_Number = sc.tracking_Number and f.order_id = sc.order_id

    select * from #sc


    Thanks for taking time to reply!

    --Quote me

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply