Suggestions for importing data into a multi-table dataset

  • I've written plenty of imports over the years, but I'm trying to find the best and fastest way to update data in one of our OLAP systems which will be updated from another system.

    Basically here's a simplified version of the tables along with the import table

    CREATE TABLE dbo.Contact (ID INTEGER, FullName NVARCHAR(50));

    CREATE TABLE dbo.ContactAddressXRef (ContactID INTEGER, AddressID Integer);

    CREATE TABLE dbo.ContactPhoneXRef (ContactID INTEGER, PhoneID INTEGER, TypeID Integer);

    CREATE TABLE dbo.ContactEmailXRef (ContactID INTEGER, EmailID INTEGER, TypeID Integer);

    CREATE TABLE dbo.[Address] (AddressID INTEGER, FullAddress NVARCHAR(50));

    CREATE TABLE dbo.Phone (PhoneID INTEGER, FullPhone NVARCHAR(50));

    CREATE TABLE dbo.Email (EmailID INTEGER, FullEmail NVARCHAR(50));

    CREATE TABLE dbo.ImportData (FullName NVARCHAR(50),FullAddress NVARCHAR(50),FullEmail NVARCHAR(50), Phone NVARCHAR(50), Fax NVARCHAR(50));

    The imported data will have a one to one with Address and Email plus a one to two on Phone for Voice and Fax. Our current system though is setup to handle one to many which is why we have the lookup tables between the Contact table and Address/Email/Phone tables. But here's a basic data load and with a View to select out the data:

    INSERT INTO dbo.Contact SELECT 1,'George Washington'

    INSERT INTO dbo.[Address] SELECT 1,'Westmoreland, Virginia'

    INSERT INTO dbo.[Email] SELECT 1,'gwashington@wh.gov'

    INSERT INTO dbo.[Phone] SELECT 1,'804-493-0130' -- Phone (Type 1)

    INSERT INTO dbo.[Phone] SELECT 2,'202-456-1111' -- Fax (Type 2)

    INSERT INTO dbo.ContactAddressXRef SELECT 1,1

    INSERT INTO dbo.ContactEmailXRef SELECT 1,1,1

    INSERT INTO dbo.ContactPhoneXRef SELECT 1,1,1 UNION SELECT 1,2,2

    Go

    CREATE VIEW dbo.vwContact As

    SELECTc.ID,

    c.FullName,

    a.FullAddress,

    e.FullEmail,

    p1.FullPhone AS Phone,

    p2.FullPhone AS Fax

    FROMdbo.Contact c

    LEFT OUTER JOIN dbo.ContactAddressXRef ax ON c.ID = ax.ContactID

    LEFT OUTER JOIN dbo.ContactEmailXRef ex ON c.ID = ex.ContactID

    LEFT OUTER JOIN dbo.ContactPhoneXRef px1 ON c.ID = px1.ContactID AND px1.TypeID = 1 -- Phone

    LEFT OUTER JOIN dbo.ContactPhoneXRef px2 ON c.ID = px2.ContactID AND px2.TypeID = 2 -- Fax

    LEFT OUTER JOIN dbo.[Address] a ON a.AddressID = ax.AddressID

    LEFT OUTER JOIN dbo.Email e ON e.EmailID = ex.EmailID

    LEFT OUTER JOIN dbo.Phone p1 ON p1.PhoneID = px1.PhoneID -- Phone

    LEFT OUTER JOIN dbo.Phone p2 ON p2.PhoneID = px2.PhoneID -- Fax

    GO

    Now I can use dbo.vwContact from within the application and all works great.

    SELECT * FROM dbo.vwContact

    IDFullNameFullAddressFullEmailPhoneFax

    1George WashingtonWestmoreland, Virginiagwashington@wh.gov804-493-0130202-456-1111

    Now here's a load of the Import table which would come from a different system:

    INSERT INTO ImportData

    SELECT 'George Washington','Westmoreland, Virginia','gw@wh.gov','804-493-0130','202-456-1111' UNION

    SELECT 'Abraham Lincoln','Hodgenville, Kentucky','al@wh.gov','270-358-9757','202-456-1111'

    Go

    And here's the problem... I need to find some simple way to Update, Insert, or Delete records in the Contact tables behind dbo.vwContact from the data loaded into dbo.ImportData.

    For now I'm working on a convoluted SProc that'll do this, but the ony way I've found to do it is RBAR with a Loop to parse each row of the Import table and analyze whether the data is new, needs to be updated, or if the OLAP data needs to be deleted given it's not in the Import table.

    I know Views can handle Updates as long as you only insert into columns within the same root table so I thought of doing this to handle the Updates, but that still doesn't help me with Inserts and Deletes. I also tried MERGE but it doesn't seem to like views either.

    So any suggestions? I'd rather keep this in TSQL, but if I have to resort to SSIS I can if needed.

    Thanks for any suggestions.

  • Have you considered using MERGE?

    http://technet.microsoft.com/en-us/library/bb510625(v=sql.105).aspx


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • dwain.c (12/19/2012)


    Have you considered using MERGE?

    http://technet.microsoft.com/en-us/library/bb510625(v=sql.105).aspx

    Thanks, but I already looked at MERGE, and I don't think it'll work for me, at least not using a single statement. It might work for dbo.Contect (using my example tables upstream), but I don't see how it'll work for the Address, Email, and Phone data since the Address/Phone/Email/Fax record has to be created first to create the ID, then that ID has to be added to the cross reference table to tie it back to the Contact. MERGE won't do this, as far as I know anyway.

    But I've never used Merge in any practical manner, so I maybe underestimating the magic of how it works. Also doesn't MERGE just loop through the rows RBAR? If so then this might also be another reason to avoid Merge.

  • samalex (12/20/2012)


    dwain.c (12/19/2012)


    Have you considered using MERGE?

    http://technet.microsoft.com/en-us/library/bb510625(v=sql.105).aspx

    Thanks, but I already looked at MERGE, and I don't think it'll work for me, at least not using a single statement. It might work for dbo.Contect (using my example tables upstream), but I don't see how it'll work for the Address, Email, and Phone data since the Address/Phone/Email/Fax record has to be created first to create the ID, then that ID has to be added to the cross reference table to tie it back to the Contact. MERGE won't do this, as far as I know anyway.

    But I've never used Merge in any practical manner, so I maybe underestimating the magic of how it works. Also doesn't MERGE just loop through the rows RBAR? If so then this might also be another reason to avoid Merge.

    MERGE does cause the server to loop through the rows internally, in reality every sql statement that works on multiple rows has to iterate internally. However MERGE doesn't use a T-SQL loop ie., a loop you've composed yourself which in my experience is the source of slowness (assuming everything else is ok, good plan, no triangular joins etc). If the loop is implicit in the statement ie., MERGE operates on many rows per single invocation of the statement and of course the previous caveats hold, MERGE should be pretty good performancewise!

    I'd love to know why T-SQL is bad with loops, I have working theories which hold up in most cases, but this is a terrible forum for discussing the cause of T-SQL slowness at least in my experience.

  • As I said I haven't used MERGE in any practical application so I haven't really read-up on the in's and out's of how it performs. I just assumed given the syntax that it was going row by row from the Source table and applying the changes to the Target table as would occur in a Loop, but again that's just an assumption.

    But honestly I'm just trying to find the simplest way to update the data when it's spread out over multiple tables per the example schema in my first post. Given what I have read about MERGE, regardless of performance, I don't see how it would work.

    Here's what I have to test MERGE:

    ;MERGE dbo.vwContact AS t

    USING dbo.ImportData AS s ON (t.FullName = s.FullName)

    WHEN MATCHED THEN

    UPDATE SET FullAddress = s.FullAddress, FullEmail = s.FullEmail, Phone = s.Phone, Fax = s.Fax

    WHEN NOT MATCHED BY Target THEN

    INSERT (FullName, FullAddress, FullEmail,Phone, Fax)

    VALUES (FullName, FullAddress, FullEmail,Phone, Fax)

    OUTPUT $action, inserted.*, deleted.*;

    But this errors out with this, which is what I'd expect given the Target is a View:

    Msg 4405, Level 16, State 1, Line 2

    View or function 't' is not updatable because the modification affects multiple base tables.

    So I tried splitting the WHEN MATCHED THEN statement into multiple statements to update Address, Email, Phone, and Fax separately, but MERGE doesn't appear to be robust enough to handle that... it only allows one update statement.

    If Merge will work for my scenario I'd love use it, but I just don't see how.

    Thanks --

    Sam

  • I would use multiple MERGE's (I think it can be done with 3) against the base tables. I never update using views, just a personal opinion. Wrap the entire thing in a Transaction and call it a day. I would avoid the RBAR at all costs.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply