February 22, 2006 at 8:27 am
I have an import mechanism where various sites send in new accounts payable data each day via a CSV file. My DTS package concatenates all available CSV files into one big one, then imports them into a temporary table. It's the next step in the process I need some help with.
Example table layout of my production table tblAPData (primary key defined by * columns):
DataDate Region* Vendor* Amount
What I would like to do is insert any new records into this table if the Region/Vendor combination does not already exist. I'd like to update any records where the source data has the same Region/Vendor and the same/newer DataDate. I want to ignore any records that have the same Region/Vendor but an older DataDate.
Sample tblAPData at the start:
DataDate Region* Vendor* Amount
2/1/2006 USA 123 $10.00
2/1/2006 USA 457 $20.00
2/4/2006 CAN XYZ $15.00
2/3/2006 USA 444 $99.00
Sample tblAPDataTemp that I need to process:
DataDate Region Vendor Amount
1/9/2006 USA 123 $15.00
2/1/2006 USA 123 $15.00
2/1/2006 USA 123 $15.00
2/6/2006 USA 999 $21.00
2/7/2006 USA 999 $13.00
2/6/2006 CAN XYZ $44.00
End result I desire in tblAPData:
DataDate Region* Vendor* Amount
2/1/2006 USA 123 $15.00
2/1/2006 USA 457 $20.00
2/3/2006 USA 444 $99.00
2/7/2006 USA 999 $13.00
2/6/2006 CAN XYZ $44.00
I think I got that all right. I'd want to ignore the 1/9/2006 data for Vendor 123 because it was old. I'd want to update using the 2/1/2006 data for Vendor 123 (managing the duplicate record in the temp table). I'd want to update the Vendor XYZ data because it is newer in the temp table, and I'd want to insert new Vendor 999. I have to deal with both records for Vendor 999 and end up with the newest data in my final table. I'd leave Vendor 444 unchanged.
I'm guessing I want to do this as a two step process - identify the new records to insert, then process the existing records via updates when appropriate.
Anyone have some code that should handle the variations illustrated above?
Thanks.
February 22, 2006 at 8:37 am
Typically:
-- Create new records where the key does not exist
INSERT INTO (column list)
WHERE NOT EXISTS (
SELECT * FROM Target
WHERE Target.Primarykeys = Source.PrimaryKeys
)
-- Update existing records where the key already exists
-- and filtering for any other business rules, like date checks
UPDATE T
SET
Column1 = S.Column1
...
ColumnN = S.ColumnN
FROM Target As T
INNER JOIN Source As S
ON (T.PrimaryKeys = S.PrimaryKeys)
WHERE {Any condition like your date check}
February 22, 2006 at 8:48 am
Question - on the INSERT INTO statement, do the two records for Vendor 999 get picked up in the subquery? And if so, does this cause a problem when it tries to insert two records for that Vendor since they have the same primary key values for tblAPData? Procedurally, I guess I'd like the 2/6/2006 record for Vendor 999 to be inserted, then the 2/7/2006 record to be used in an update.
Thanks.
February 22, 2006 at 9:06 am
Didn't notice that condition. In that case you need to incorporate a derived table (I named it "dtMostRecent") to allow selection of only the latest row per keyset.
INSERT INTO Target
(Region, Vendor, DataDate, Amount)
SELECT
S.Region, S.Vendor, S.DataDate, S.Amount
FROM tblAPDataTemp As S
INNER JOIN
(
SELECT Region, Vendor, Max(DataDate) As MostRecentDate
FROM tblAPDataTemp
GROUP BY Region, Vendor
) dtMostRecent
ON (S.Region = dtMostRecent.Region AND
S.Vendor = dtMostRecent.Vendor AND
S.DataDate = dtMostRecent.MostRecentDate)
WHERE NOT EXISTS (
SELECT *
FROM Target As T
WHERE T.Vendor = S.Vendor
AND T.Region = S.Region
)
February 22, 2006 at 10:37 am
OK, I think I understand your INSERT statement. I've put the code in, but have yet to test. At least it parses correctly.
As for the UPDATE statement, here it is coded to my actual database fields. There's an extra INNER JOIN in there to pick up a value from a third table. This syntax produces a parse error in SQL Server 2000.
Error: "Deferred prepare could not be completed. Statement(s) could not be prepared. Line 2: Incorrect syntax near '='."
(Note that the line number for my query and the code pasted here may not match because of formatting.)
Query (edited to fix typo noted in post below):
UPDATE tblAPData
SET tblAPData.DataDate = T.DataDate, tblAPData.VendorName = T.VendorName, tblAPData.PaymentTerms = T.PaymentTerms, tblAPData.TotalAmount = T.TotalAmount,
tblAPData.CurrentAmount = T.CurrentAmount, tblAPData.Overdue1_30 = T.Overdue1_30, tblAPData.Overdue31_60 = T.Overdue31_60, tblAPData.Overdue61_90 = T.Overdue61_90,
tblAPData.Overdue91 = T.Overdue91, tblAPData.CurrencyCode = T.CurrencyCode, tblAPData.ConversionFactor = C.ConversionFactor
INNER JOIN tblAPDataTemp AS T
ON tblAPData.Site = T.Site AND tblAPData.Region = T.Region AND tblAPData.VendorNumber = T.VendorNumber
INNER JOIN tblCurrencyConversion C
ON T.CurrencyCode = C.CurrencyCode
WHERE (T.DataDate >= tblAPData.DataDate);
So the first question is what produces the parse error, and a second question is how to replace "tblAPData" with an alias in this statement to save some typing. Simply using "tblAPData AS D" right after UPDATE produces an error.
Thanks.
February 22, 2006 at 10:56 am
UPDATE tblAPData
SET tblAPData.DataDate = tblAPData.DataDate = T.DataDate
2 equal signs ?
February 22, 2006 at 11:09 am
Yes, that typo did exist (now fixed), but I get a similar error message with the last line now:
"Incorrect syntax near the keyword 'INNER'."
February 22, 2006 at 11:29 am
Compare this to the 'template' in my 1st reply.
UPDATE AP -- Update the alias name
SET
-- Columns being updated are not prefixed by tablename or alias
DataDate = T.DataDate, VendorName = T.VendorName,
PaymentTerms = T.PaymentTerms, TotalAmount = T.TotalAmount,
CurrentAmount = T.CurrentAmount, etc etc
FROM tblAPData AS AP -- alias the table being updated
INNER JOIN tblAPDataTemp AS T
ON AP.Site = T.Site AND
AP.Region = T.Region AND
AP.VendorNumber = T.VendorNumber
INNER JOIN tblCurrencyConversion C
ON T.CurrencyCode = C.CurrencyCode
WHERE (T.DataDate >= AP.DataDate);
February 22, 2006 at 11:36 am
Thanks. That parsed fine. Just sloppy editing on my part.
February 22, 2006 at 1:39 pm
I actually find that using a LEFT OUTER JOIN performs better than a NOT EXISTS. The syntax that I normally use to insert new records that don't already exist is:
INSERT INTO Target (column list)
SELECT column list
FROM Source S
LEFT OUTER JOIN Target T
ON (T.PrimaryKeys = S.PrimaryKeys)
WHERE T.ID IS NULL
Viewing 10 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply