Home Forums Programming XML Spliting XML file into Parent/Child table using SSIS package RE: Spliting XML file into Parent/Child table using SSIS package

  • Hi Balaji,

    before we go into details a few comments:

    1) You should not post real data as it might get you into trouble for posting stuff that's not within your authority. It is common practise to abstract the problem into "neutral" data.

    2) The "sample data" you provided might get you into serious trouble in terms of SQL vulnerability since everything is included: IP's, database names, table and column specs a.s.o.

    If your systems has any web interface you just opened up the door for BIG trouble. I asked the administrator to remove your attachment to protect your data.

    3) Sample data are supposed to be descriptive and SHORT. The second part is not fulfilled with the unzipped 2.7MB file...

    4) The expected result is supposed to show you the concept of how to work on your solution and not to completely do your job...

    5) If posting in XML forum you should add the SQL version you're using since some of the solutions do not work in older SQL Server version.

    Point #4 brings me to the actual solution (usable from SS2K5 and up):

    --create temp table to hold the temp xml data

    CREATE TABLE #t_xml(

    [col1] [xml] NULL,

    [ID] [int] IDENTITY(1,1) NOT NULL,

    CONSTRAINT [PK_t_xml] PRIMARY KEY CLUSTERED

    (

    [ID] ASC

    ))

    -- insert xml raw data into temp table

    INSERT INTO #t_xml SELECT BulkColumn

    FROM OPENROWSET (BULK 'A:\temp\DIS_CRPSCSMSQHI_20090603_030110.xml', SINGLE_BLOB) TempXML -- change folder to your needs

    -- add index to temp table to improve performance

    CREATE PRIMARY XML INDEX ix_t_xml ON #t_xml

    (

    [col1]

    )

    -- select table information (example)

    ;WITH XMLNAMESPACES (

    DEFAULT 'DBASupport.bankofamerica.com'

    )

    SELECT

    c.value('../@NM', 'varchar(100)') AS TBL_NM,

    c.value('@CID', 'varchar(100)') AS TBL_CID,

    c.value('@NM', 'varchar(100)') AS TBL_NM,

    c.value('@TYP', 'varchar(100)') AS TBL_TYP,

    c.value('@LN', 'varchar(100)') AS TBL_LN,

    c.value('@PRC', 'varchar(100)') AS TBL_PRC,

    c.value('@SCL', 'varchar(100)') AS TBL_SCL,

    c.value('@DFT', 'varchar(100)') AS TBL_DFT,

    c.value('@RUL', 'varchar(100)') AS TBL_RUL

    FROM #t_xml

    CROSS APPLY

    col1.nodes('DRT/MAC/INST/DB/TBL/CL') T(c);

    /* result set

    10704 rows with column definition per table

    */

    Note: Even if it's possible to add more levels to the select clause (e.g. database information), I wouldn't recommend it with a file as large as yours, since it badly hurts performance: to get the data as shown above it took approx. 8sec on my machine but when I included the database name from one level above it went up to 5min.

    With that sample code you should be able to get your issue resolved (e.g. by inserting the values into a separate table).

    If you need more information regarding the use of xml data you might want to look into the series "XML workshop" by Jacob Sebastian. It's really helpful! (search for "XML workshop" on this site)

    To get the tables per database just change the select statement to

    ;WITH XMLNAMESPACES (

    DEFAULT 'DBASupport.bankofamerica.com'

    )

    SELECT

    c.value('../@DBNM', 'varchar(100)') AS DB_DBNM,

    c.value('@NM', 'varchar(100)') AS TBL_NM

    FROM #t_xml

    CROSS APPLY

    col1.nodes('DRT/MAC/INST/DB/TBL') T(c)

    /*result set 502 rows*/;



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]