Insert/Update processing on a batch of new records

  • I have an import mechanism where various sites send in new accounts payable data each day via a CSV file. My DTS package concatenates all available CSV files into one big one, then imports them into a temporary table. It's the next step in the process I need some help with.

    Example table layout of my production table tblAPData (primary key defined by * columns):

    DataDate Region* Vendor* Amount

    What I would like to do is insert any new records into this table if the Region/Vendor combination does not already exist. I'd like to update any records where the source data has the same Region/Vendor and the same/newer DataDate. I want to ignore any records that have the same Region/Vendor but an older DataDate.

    Sample tblAPData at the start:

    DataDate Region* Vendor* Amount

    2/1/2006 USA 123 $10.00

    2/1/2006 USA 457 $20.00

    2/4/2006 CAN XYZ $15.00

    2/3/2006 USA 444 $99.00

    Sample tblAPDataTemp that I need to process:

    DataDate Region Vendor Amount

    1/9/2006 USA 123 $15.00

    2/1/2006 USA 123 $15.00

    2/1/2006 USA 123 $15.00

    2/6/2006 USA 999 $21.00

    2/7/2006 USA 999 $13.00

    2/6/2006 CAN XYZ $44.00

    End result I desire in tblAPData:

    DataDate Region* Vendor* Amount

    2/1/2006 USA 123 $15.00

    2/1/2006 USA 457 $20.00

    2/3/2006 USA 444 $99.00

    2/7/2006 USA 999 $13.00

    2/6/2006 CAN XYZ $44.00

    I think I got that all right. I'd want to ignore the 1/9/2006 data for Vendor 123 because it was old. I'd want to update using the 2/1/2006 data for Vendor 123 (managing the duplicate record in the temp table). I'd want to update the Vendor XYZ data because it is newer in the temp table, and I'd want to insert new Vendor 999. I have to deal with both records for Vendor 999 and end up with the newest data in my final table. I'd leave Vendor 444 unchanged.

    I'm guessing I want to do this as a two step process - identify the new records to insert, then process the existing records via updates when appropriate.

    Anyone have some code that should handle the variations illustrated above?

    Thanks.

  • Typically:

    -- Create new records where the key does not exist

    INSERT INTO (column list)

    WHERE NOT EXISTS (

      SELECT * FROM Target

      WHERE Target.Primarykeys = Source.PrimaryKeys

    )

    -- Update existing records where the key already exists

    -- and filtering for any other business rules, like date checks

    UPDATE T

    SET

      Column1 = S.Column1

      ...

      ColumnN = S.ColumnN

    FROM Target As T

    INNER JOIN Source As S

      ON (T.PrimaryKeys = S.PrimaryKeys)

    WHERE {Any condition like your date check}

     

  • Question - on the INSERT INTO statement, do the two records for Vendor 999 get picked up in the subquery? And if so, does this cause a problem when it tries to insert two records for that Vendor since they have the same primary key values for tblAPData? Procedurally, I guess I'd like the 2/6/2006 record for Vendor 999 to be inserted, then the 2/7/2006 record to be used in an update.

    Thanks.

  • Didn't notice that condition. In that case you need to incorporate a derived table (I named it "dtMostRecent") to allow selection of only the latest row per keyset.

    INSERT INTO Target

      (Region, Vendor, DataDate, Amount)

    SELECT

      S.Region, S.Vendor, S.DataDate, S.Amount

    FROM tblAPDataTemp As S

    INNER JOIN

    (

      SELECT Region, Vendor, Max(DataDate) As MostRecentDate

      FROM  tblAPDataTemp

      GROUP BY Region, Vendor

    ) dtMostRecent

      ON (S.Region = dtMostRecent.Region AND

            S.Vendor = dtMostRecent.Vendor AND

            S.DataDate = dtMostRecent.MostRecentDate)

    WHERE NOT EXISTS (

      SELECT *

      FROM Target As T

      WHERE T.Vendor = S.Vendor

      AND     T.Region = S.Region

    )

  • OK, I think I understand your INSERT statement. I've put the code in, but have yet to test. At least it parses correctly.

    As for the UPDATE statement, here it is coded to my actual database fields. There's an extra INNER JOIN in there to pick up a value from a third table. This syntax produces a parse error in SQL Server 2000.

    Error: "Deferred prepare could not be completed. Statement(s) could not be prepared. Line 2: Incorrect syntax near '='."

    (Note that the line number for my query and the code pasted here may not match because of formatting.)

    Query (edited to fix typo noted in post below):

    UPDATE tblAPData

    SET tblAPData.DataDate = T.DataDate, tblAPData.VendorName = T.VendorName, tblAPData.PaymentTerms = T.PaymentTerms, tblAPData.TotalAmount = T.TotalAmount,

    tblAPData.CurrentAmount = T.CurrentAmount, tblAPData.Overdue1_30 = T.Overdue1_30, tblAPData.Overdue31_60 = T.Overdue31_60, tblAPData.Overdue61_90 = T.Overdue61_90,

    tblAPData.Overdue91 = T.Overdue91, tblAPData.CurrencyCode = T.CurrencyCode, tblAPData.ConversionFactor = C.ConversionFactor

    INNER JOIN tblAPDataTemp AS T

    ON tblAPData.Site = T.Site AND tblAPData.Region = T.Region AND tblAPData.VendorNumber = T.VendorNumber

    INNER JOIN tblCurrencyConversion C

    ON T.CurrencyCode = C.CurrencyCode

    WHERE (T.DataDate >= tblAPData.DataDate);

    So the first question is what produces the parse error, and a second question is how to replace "tblAPData" with an alias in this statement to save some typing. Simply using "tblAPData AS D" right after UPDATE produces an error.

    Thanks.

  • UPDATE tblAPData

    SET tblAPData.DataDate = tblAPData.DataDate = T.DataDate

    2 equal signs ?

     

  • Yes, that typo did exist (now fixed), but I get a similar error message with the last line now:

    "Incorrect syntax near the keyword 'INNER'."

  • Compare this to the 'template' in my 1st reply.

    UPDATE AP -- Update the alias name

    SET 

      -- Columns being updated are not prefixed by tablename or alias

      DataDate = T.DataDate, VendorName = T.VendorName,

      PaymentTerms = T.PaymentTerms,  TotalAmount = T.TotalAmount,

      CurrentAmount = T.CurrentAmount, etc etc

    FROM tblAPData AS AP -- alias the table being updated

    INNER JOIN tblAPDataTemp AS T

      ON AP.Site = T.Site AND 

           AP.Region = T.Region AND 

           AP.VendorNumber = T.VendorNumber

    INNER JOIN tblCurrencyConversion C

       ON T.CurrencyCode = C.CurrencyCode

    WHERE (T.DataDate >= AP.DataDate);

  • Thanks. That parsed fine. Just sloppy editing on my part.

  • I actually find that using a LEFT OUTER JOIN performs better than a NOT EXISTS.  The syntax that I normally use to insert new records that don't already exist is:

    INSERT INTO Target (column list)

    SELECT column list

    FROM Source S

    LEFT OUTER JOIN Target T

    ON (T.PrimaryKeys = S.PrimaryKeys)

    WHERE T.ID IS NULL

Viewing 10 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply