Loading XML table that outputs multiple table

  • I have an XML file that generates about 10 output tables when loaded into an XML source component. The natural solution would be either to:

    i) store them into 10 staging tables and then create another data flow task that queries these tables and then writes the results into the main table

    ii) use an XML Task along with an XSLT script to create either a CSV file or a simplified XML file that can easily be connected with a two component data flow

    Nonetheless the client doesn't want to hear about these two solutions. He doesn't want to create any extra tables nor does he want to create another file since he is "not comfortable with XSLT". He wants everything to be imported "directly from the original XML table". Of course this is possible but the data flow task will be saturated with Merge Join and Sort transformations. My question is: is there any way to do this without having a messy data flow?

  • TheComedian (5/2/2016)


    I have an XML file that generates about 10 output tables when loaded into an XML source component. The natural solution would be either to:

    i) store them into 10 staging tables and then create another data flow task that queries these tables and then writes the results into the main table

    ii) use an XML Task along with an XSLT script to create either a CSV file or a simplified XML file that can easily be connected with a two component data flow

    Nonetheless the client doesn't want to hear about these two solutions. He doesn't want to create any extra tables nor does he want to create another file since he is "not comfortable with XSLT". He wants everything to be imported "directly from the original XML table". Of course this is possible but the data flow task will be saturated with Merge Join and Sort transformations. My question is: is there any way to do this without having a messy data flow?

    Staging tables are a best practice in the world of ETL, I would try to communicate that because it's the way to go. There's a lot of articles and books that will back you up. The second option would suffice but I understand why someone would "not [be] comfortable with XSLT." XSLT is an awesome programming language (perhaps my favorite). The problem is that few people know XSLT and therefore, doing it with XSLT creates a "key man risk" situation.

    Another viable option which get the job done without XSLT or staging tables is to create a SQL Script Task and just use the T-SQL XML Data Type Methods to get the job done. My vote is for staging tables though, that's the best option hands down.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Thanks for your response Alan. I actually suggested a third alternative: Create an OLE DB Source and write a T-SQL stored procedure to shred the XML into a SELECT statement (ala the nodes, value and query methods). The solution is relatively clean because you just have one SELECT statement in the stored procedure that's called by the source component and the results are simply placed in the OLE DB Destination component.

    Nonetheless unfortunately the client didn't like this solution either since the code is "hard to understand". Instead a solution involving multiple Sorts and Merge Joins was adapted, which will also work but I just hope that the XML files aren't too big :-D.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply