Create a delta from downloaded info

  • You are correct on the PK.

    Thanks.

  • Jeff Moden wrote:

    Bruin wrote:

    With the same Line,subline?

    Look at the PK you've chosen.  IMHO, only the first 3 columns of InvoiceNbr, Line, and SubLine should make up the PK.

    I'm more confused than that and have a more fundamental q.  Invoices and shipments are two different data entities.  Invoices don't have a "ship date", you don't ship invoices, you ship shipments, which can have several different shipment dates.

    To the OP: is your table for invoices or for shipments??

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher wrote:

    Jeff Moden wrote:

    Bruin wrote:

    With the same Line,subline?

    Look at the PK you've chosen.  IMHO, only the first 3 columns of InvoiceNbr, Line, and SubLine should make up the PK.

    I'm more confused than that and have a more fundamental q.  Invoices and shipments are two different data entities.  Invoices don't have a "ship date", you don't ship invoices, you ship shipments, which can have several different shipment dates.

    Is that table for invoices or for shipments??

    Invoice line items can have a ship date but I agree that there's a bit of a "fog on the bog" for this one.

    I'm still doing my FTE work at this late hour and won't get to a proposed solution tonight but a little trick with HASHBYTEs for the non-PK columns will make for a nasty fast solution leading to a nice, simple, fast UPSERT with the comparison the OP wants.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    Invoice line items can have a ship date

    Only with a bad data model.  Parts from the same invoice line could be shipped on different dates.  You don't split invoice lines based on when they shipped.  At least I've never seen that done in my decades working with ordering / invoice / shipment / credit systems.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher wrote:

    Jeff Moden wrote:

    Invoice line items can have a ship date

    Only with a bad data model.  Parts from the same invoice line could be shipped on different dates.  You don't split invoice lines based on when they shipped.  At least I've never seen that done in my decades working with ordering / invoice / shipment / credit systems.

    Yeah... I do agree that there should be a shipment table and the InvoiceDetail table should contain ShipmentIDs for the line items in the invoice detail table.  Might even be able to justify a "bridge" table but a lot of folks would consider that to be some serious overkill.

    Of course, other people that just want to do reporting will disagree with us.  It's the ol' "data mart" argument.  Not saying that's right.  I'm just saying that's where a lot of folks end up.  Heh... long live the duplication of data for reporting, eh?  It helps us justify buying extra hardware. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • You where correct with the " It's the ol' "data mart"... It's kind of a hybrid table.

    Will that effect the Delta output?

    THanks.

  • Bruin wrote:

    You where correct with the " It's the ol' "data mart"... It's kind of a hybrid table.

    Will that effect the Delta output? THanks.

    No - but I ask again.  Have you tried using EXCEPT - and if so, what were the results?  You were also asked to provide the expected results from you sample data - still waiting on that information.

    It has not been made clear what you want to accomplish.  Is the result you want a table that shows the previous row - and the new row?  Or do you just need to identify the changes so they can be applied to the final table?  Something else?

    If you want a table that contains the history for a row - then you could setup and use a temporal table.  That would be the simpler method to implement - then all you need to do is update rows in that table where there is a difference and insert new rows.  The system then generates the entries in the history table for you.

    Either way - you can use EXCEPT to identify the rows in the staging table where a value has changed or the row is new.

    Select InvoiceNbr
    , Line
    , SubLine
    , InvoiceDate
    , ShipDate
    , MwAcctType
    , Channel
    From #staging
    Except
    Select InvoiceNbr
    , Line
    , SubLine
    , InvoiceDate
    , ShipDate
    , MwAcctType
    , Channel
    From #daily_invoices;

    You can then use the above with MERGE - or use separate UPDATE/INSERT statements with that data as the source for the update/insert.

     

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Thanks Except seems to work for what I need.

    Thanks Again!!

  • I used to use EXCEPT but usually don't anymore because an old friend showed me long ago just exactly how slow it can actually be and Andy Leonard confirmed that HashBytes is actually faster especially since it allow all 4 conditions to be easily determined in a single pass of the tables.  EXCEPT also doesn't support the idea of doing a Before'n'After check of the 4 possible conditions rows in a staging table can have.  And for my production work, EXCEPT actually did crawl compared to the HashBytes method and that was good enough for me.  It was an "InSitu" test. 😀

    First, just to keep things together, here's a reworked version of the previously provided test data.

    --===== Create and populate the table that contains the permanent data.
    DROP TABLE IF EXISTS #daily_invoices;
    GO
    CREATE TABLE #daily_invoices
    (
    InvoiceNbr VARCHAR(8) NOT NULL
    ,Line INT NOT NULL
    ,SubLine INT NOT NULL
    ,InvoiceDate DATE NOT NULL
    ,ShipDate DATE NOT NULL
    ,MwAcctType VARCHAR(25) NULL
    ,Channel NVARCHAR(12) NULL
    ,HashMatch AS HASHBYTES('SHA1',CONCAT(InvoiceDate,'|',ShipDate,'|',MwAcctType,'|',Channel)) PERSISTED
    ,PRIMARY KEY CLUSTERED (InvoiceNbr,Line,SubLine)
    )
    ;
    INSERT INTO #daily_invoices WITH (TABLOCK)
    VALUES (78123,1,0,'8-13-2021','8-10-2021',1002,'rrb')
    ,(78124,1,0,'8-13-2021','8-10-2021',1002,'rrb')
    ,(78127,1,0,'8-13-2021','8-10-2021',1002,'rrb')
    ,(78128,1,0,'8-13-2021','8-10-2021',1002,'rrb')
    ,(78129,1,0,'8-13-2021','8-10-2021',1002,'rrb')
    ;
    SELECT *
    FROM #daily_invoices
    ;
    --===== Create and populate the table that contains the permanent data.
    DROP TABLE IF EXISTS #staging;
    GO
    CREATE TABLE #staging
    (
    InvoiceNbr VARCHAR(8) NOT NULL
    ,Line INT NOT NULL
    ,SubLine INT NOT NULL
    ,InvoiceDate DATE NOT NULL
    ,ShipDate DATE NOT NULL
    ,MwAcctType VARCHAR(25) NULL
    ,Channel NVARCHAR(12) NULL
    ,HashMatch AS HASHBYTES('SHA1',CONCAT(InvoiceDate,'|',ShipDate,'|',MwAcctType,'|',Channel)) PERSISTED
    --,RowType CHAR(1)
    ,INDEX NCI_Inv(InvoiceNbr,Line,SubLine)
    )
    ;
    INSERT INTO #staging WITH (TABLOCK)

    VALUES (78123,1,0,'8-13-2021','8-10-2021',1002,'rrb')
    ,(78124,1,0,'8-13-2021','8-10-2021',1002,'rrb')
    ,(78125,1,0,'8-13-2021','8-10-2021',1002,'rrb')
    ,(78126,1,0,'8-13-2021','8-10-2021',1002,'rrb')
    ,(78127,1,0,'8-13-2021','8-10-2021',1001,'rrb')
    ,(78128,1,0,'8-13-2021','8-10-2021',1021,'rrz')
    ;
    SELECT *
    FROM #staging
    ;

    Then, a fully visible return of both tables in a FULL JOIN.  Obviously, we wouldn't display all of this data in real life.  Instead, we'd likely only update the "RowType" byte as a pre-process.  You could also use 3 of the conditions to build the 3 statements for an UPSERT.  Or... if you really insisted on using MERGE, the whole HashMatch method might (I don't know for sure because I flat out refuse to use MERGE) make for a faster merge because it's only comparing one column and the key columns instead of all the columns.

    --===== Display the differences. This could be used to guide and "Upsert" instead of using MERGE,
    -- which I still worry about because of all the insane problems it used to have.
    SELECT *
    ,RowType = CASE
    WHEN tgt.InvoiceNbr IS NULL THEN 'N' --NEW, need to add row to target
    WHEN src.InvoiceNbr IS NULL THEN 'D' --DELETE, need to delete row from target (OPTIONAL??)
    WHEN src.HashMatch = tgt.HashMatch THEN 'S' --Same, No change, No action required
    ELSE 'U' --target needs to be updated from the row in the source.
    END
    FROM #daily_invoices tgt
    FULL JOIN #staging src
    ON tgt.InvoiceNbr = src.InvoiceNbr
    AND tgt.Line = src.Line
    AND tgt.SubLine = src.SubLine
    ;

    The demo code above works with the table creation and population script above and should work with no changes required, like it did for me.

    And, to emphasize the performance, this method is lightning fast compared to EXCEPT and gets comparatively much faster the more columns you need to check.  You'd think that the CONCAT and HASHBYTEs function would slow things down a whole lot compared to EXCEPT but it doesn't.

    Collisions: Any time you use HashBytes, there is a chance for collisions no matter how slim.  If there are two equal hashbytes, they could still be different.  It they're not equal, then they're guaranteed to be different.

    That could leave a small number of potentially different values identified as being the same when they're not.  To prevent that issue (especially on the huge runs that I frequently do), I add another set of HashMatch columns (1 in each table) where I simply move the first column in the CONCATenation to be the last column.  That makes makes it virtually impossible to miss any non-matches and still be a lot faster than EXCEPT especially for a large number of columns. but even with a small number of columns because EXCEPT won't give you all 4 UPSERT/MERGE conditions.  It only gives you 1.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I almost forgot.  One "trick" I do (that I didn't do here) is to add an "_stg" suffix to the column names in the staging table so that when I'm working with both tables, the column names are different, just in case I want to use INTO to send the output to a TempTable and still be able to use * on some really wide tables but the column names will still sort nicely if I have to look at all the column names in a single meta-data query.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks will test the Code out!!

  • When I run the piece for creating #staging:

     

    Column name or number of supplied values does not match table definition.

    Thx,

  • Bruin wrote:

    When I run the piece for creating #staging:

    Column name or number of supplied values does not match table definition.

    Thx,

    I'd gotten interrupted while I was working on the code.  I had two slightly different versions of the tables in code and posted the wrong one.

    I've corrected the code.  Please try again.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff...do you still post your "1 million row" test scripts?

    Would love to see one for this Q.

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • Worked perfect thanks!!!!

Viewing 15 posts - 31 through 44 (of 44 total)

You must be logged in to reply to this topic. Login to reply