• Phil Parkin - Wednesday, August 23, 2017 5:42 PM

    Tom_Hogan - Wednesday, August 23, 2017 4:56 PM

    --.  I'm not sure what benefit putting the data into XML vs., say, CSV would be from an SSIS perspective.

    Me neither. I agree with Jeff's comments, at least when it comes to XML !

    The benefit could be in the ability to make the file-transformation process generic such that you never have to modify it, even when new fields are added to a data partner file format. The downside is the XML tag overhead, however in this case I think it might be a good trade-off. I usually prefer a design that allows my data partners the flexibility to add new fields and have my ingestion process require as few changes as possible.

    If we look at two solutions side-by-side and we walk through a scenario where a data partner wants to add a field that I eventually want in my database we can see a clear difference in the level of effort and coordination required to respond to the change. Say Data Partner ABC wants to start sending me date_of_birth as a new field in a file they already send me.

    Solution 1 converts all incoming flat-files into a common flat-file format (e.g. pipe-delimited) which is fed to a common SSIS Package. For this solution to keep up with ABC's changes I would need to modify the file-transformation process on the same day when ABC introduces the date_of_birth field in their file to ensure it is passed through in the common flat-file format the SSIS Package receives. The first part of this is tightly coupled to the data partner's changes and requires me to deliver changes to my process in order to begin processing the new data field, a type of arrangement I like to avoid with data partners.

    Solution 2 converts all incoming flat-files into a generic XML format which is fed to a common SSIS Package. For this to keep up with ABC's changes I will not need to modify the file-transformation process and ABC can roll, or rollback, their change to introduce date_of_birth without affecting anything on my side because date_of_birth will automatically show up in the XML file per the generic transform process. Since my SSIS Package already knows about date_of_birth since other data partners already send it to me, I had no changes to make and now I am getting date_of_birth from ABC.

    This is one scenario where some changes were required in one solution and no changes were required in the other. There are other scenarios where some changes would need to happen in either solution, but for the ones coming to mind still less change to the XML based solution.

    Will byte counts be larger with XML? Without a doubt. In most cases I am willing to trade bytes for flexibility, resilience and the ability to insulate my systems from external changes.

    Hopefully all my assumptions about your situation are close to your real situation. Sorry for the winding and weaving post. Hopefully it makes sense. Let me know how you're thinking about the solution.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato