Insert/Update processing on a batch of new records

Question

Post reply

Insert/Update processing on a batch of new records

Lane Kelly

Old Hand

Points: 307
More actions
February 22, 2006 at 8:27 am

#113410

I have an import mechanism where various sites send in new accounts payable data each day via a CSV file. My DTS package concatenates all available CSV files into one big one, then imports them into a temporary table. It's the next step in the process I need some help with.
Example table layout of my production table tblAPData (primary key defined by * columns):
DataDate Region* Vendor* Amount
What I would like to do is insert any new records into this table if the Region/Vendor combination does not already exist. I'd like to update any records where the source data has the same Region/Vendor and the same/newer DataDate. I want to ignore any records that have the same Region/Vendor but an older DataDate.
Sample tblAPData at the start:
DataDate Region* Vendor* Amount
2/1/2006 USA 123 $10.00
2/1/2006 USA 457 $20.00
2/4/2006 CAN XYZ $15.00
2/3/2006 USA 444 $99.00
Sample tblAPDataTemp that I need to process:
DataDate Region Vendor Amount
1/9/2006 USA 123 $15.00
2/1/2006 USA 123 $15.00
2/1/2006 USA 123 $15.00
2/6/2006 USA 999 $21.00
2/7/2006 USA 999 $13.00
2/6/2006 CAN XYZ $44.00
End result I desire in tblAPData:
DataDate Region* Vendor* Amount
2/1/2006 USA 123 $15.00
2/1/2006 USA 457 $20.00
2/3/2006 USA 444 $99.00
2/7/2006 USA 999 $13.00
2/6/2006 CAN XYZ $44.00
I think I got that all right. I'd want to ignore the 1/9/2006 data for Vendor 123 because it was old. I'd want to update using the 2/1/2006 data for Vendor 123 (managing the duplicate record in the temp table). I'd want to update the Vendor XYZ data because it is newer in the temp table, and I'd want to insert new Vendor 999. I have to deal with both records for Vendor 999 and end up with the newest data in my final table. I'd leave Vendor 444 unchanged.
I'm guessing I want to do this as a two step process - identify the new records to insert, then process the existing records via updates when appropriate.
Anyone have some code that should handle the variations illustrated above?
Thanks.

Viewing 10 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

PW-201837 SSC-Insane Points: 20805 More actions · Answer 1

Typically:

-- Create new records where the key does not exist

INSERT INTO (column list)

WHERE NOT EXISTS (

SELECT * FROM Target

WHERE Target.Primarykeys = Source.PrimaryKeys

)

-- Update existing records where the key already exists

-- and filtering for any other business rules, like date checks

UPDATE T

SET

Column1 = S.Column1

...

ColumnN = S.ColumnN

FROM Target As T

INNER JOIN Source As S

ON (T.PrimaryKeys = S.PrimaryKeys)

WHERE {Any condition like your date check}

Lane Kelly Old Hand Points: 307 More actions · Answer 2

Question - on the INSERT INTO statement, do the two records for Vendor 999 get picked up in the subquery? And if so, does this cause a problem when it tries to insert two records for that Vendor since they have the same primary key values for tblAPData? Procedurally, I guess I'd like the 2/6/2006 record for Vendor 999 to be inserted, then the 2/7/2006 record to be used in an update.

Thanks.

PW-201837 SSC-Insane Points: 20805 More actions · Answer 3

Didn't notice that condition. In that case you need to incorporate a derived table (I named it "dtMostRecent") to allow selection of only the latest row per keyset.

INSERT INTO Target

(Region, Vendor, DataDate, Amount)

SELECT

S.Region, S.Vendor, S.DataDate, S.Amount

FROM tblAPDataTemp As S

INNER JOIN

(

SELECT Region, Vendor, Max(DataDate) As MostRecentDate

FROM tblAPDataTemp

GROUP BY Region, Vendor

) dtMostRecent

ON (S.Region = dtMostRecent.Region AND

S.Vendor = dtMostRecent.Vendor AND

S.DataDate = dtMostRecent.MostRecentDate)

WHERE NOT EXISTS (

SELECT *

FROM Target As T

WHERE T.Vendor = S.Vendor

AND T.Region = S.Region

)

Lane Kelly Old Hand Points: 307 More actions · Answer 4

OK, I think I understand your INSERT statement. I've put the code in, but have yet to test. At least it parses correctly.

As for the UPDATE statement, here it is coded to my actual database fields. There's an extra INNER JOIN in there to pick up a value from a third table. This syntax produces a parse error in SQL Server 2000.

Error: "Deferred prepare could not be completed. Statement(s) could not be prepared. Line 2: Incorrect syntax near '='."

(Note that the line number for my query and the code pasted here may not match because of formatting.)

Query (edited to fix typo noted in post below):

UPDATE tblAPData

SET tblAPData.DataDate = T.DataDate, tblAPData.VendorName = T.VendorName, tblAPData.PaymentTerms = T.PaymentTerms, tblAPData.TotalAmount = T.TotalAmount,

tblAPData.CurrentAmount = T.CurrentAmount, tblAPData.Overdue1_30 = T.Overdue1_30, tblAPData.Overdue31_60 = T.Overdue31_60, tblAPData.Overdue61_90 = T.Overdue61_90,

tblAPData.Overdue91 = T.Overdue91, tblAPData.CurrencyCode = T.CurrencyCode, tblAPData.ConversionFactor = C.ConversionFactor

INNER JOIN tblAPDataTemp AS T

ON tblAPData.Site = T.Site AND tblAPData.Region = T.Region AND tblAPData.VendorNumber = T.VendorNumber

INNER JOIN tblCurrencyConversion C

ON T.CurrencyCode = C.CurrencyCode

WHERE (T.DataDate >= tblAPData.DataDate);

So the first question is what produces the parse error, and a second question is how to replace "tblAPData" with an alias in this statement to save some typing. Simply using "tblAPData AS D" right after UPDATE produces an error.

Thanks.

PW-201837 SSC-Insane Points: 20805 More actions · Answer 5

UPDATE tblAPData

SET tblAPData.DataDate = tblAPData.DataDate = T.DataDate

2 equal signs ?

Lane Kelly Old Hand Points: 307 More actions · Answer 6

Yes, that typo did exist (now fixed), but I get a similar error message with the last line now:

"Incorrect syntax near the keyword 'INNER'."

PW-201837 SSC-Insane Points: 20805 More actions · Answer 7

Compare this to the 'template' in my 1st reply.

UPDATE AP -- Update the alias name

SET

-- Columns being updated are not prefixed by tablename or alias

DataDate = T.DataDate, VendorName = T.VendorName,

PaymentTerms = T.PaymentTerms, TotalAmount = T.TotalAmount,

CurrentAmount = T.CurrentAmount, etc etc

FROM tblAPData AS AP -- alias the table being updated

INNER JOIN tblAPDataTemp AS T

ON AP.Site = T.Site AND

AP.Region = T.Region AND

AP.VendorNumber = T.VendorNumber

INNER JOIN tblCurrencyConversion C

ON T.CurrencyCode = C.CurrencyCode

WHERE (T.DataDate >= AP.DataDate);

Lane Kelly Old Hand Points: 307 More actions · Answer 8

Thanks. That parsed fine. Just sloppy editing on my part.

JeffB Hall of Fame Points: 3541 More actions · Answer 9

I actually find that using a LEFT OUTER JOIN performs better than a NOT EXISTS. The syntax that I normally use to insert new records that don't already exist is:

INSERT INTO Target (column list)

SELECT column list

FROM Source S

LEFT OUTER JOIN Target T

ON (T.PrimaryKeys = S.PrimaryKeys)

WHERE T.ID IS NULL