• I don't have any links to anything specific but I can tell you a bit from personal experience.  I don't know what format your external sources are in but every time I've seen someone else try to join or merge external sources, it always seems that something eventually goes haywire.  That's not to say that one of the heavy hitters on this and other forums would have such problems.  I'm just saying that I've not personally seen people have much success with the joining of sources externally.

    What I've personally always done is to import the data from a source into a staging table dedicated to receiving the data from the source.  Partial validation is done during this step because I make sure the data types in the table are the data types expected from the source.  The rest of the validation is done after the data is loaded into the staging table.

    I generally write one proc per staging table so that I can change the order of imports and sometimes do intermediate processing as controlled by a master procedure that does all of the tables.  Once the tables are loaded, the you can do joins and merges however you see fit and they'll usually be faster than trying to do external joins of multiple sources.

    All of that is a bit more complex than what most people come up with.  It's also, IMHO, a whole lot more flexible.

    The key for me is that I don't need SSIS to do any of it and I don't need to redeploy a package every time I need to make a change. I usually just need to change one of the sub-procs called by the master proc and I'm done.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)