Spliting XML file into Parent/Child table using SSIS package

  • Hi,

    I want to load xml file into table. The xml file has many hierarchy level and relationship. I want to load the xml file and move the data to tables.

    Used the IsSorted property to True to each and every outputs but i couldnt able to achieve the result.

    Attached is the XML file.

    Below is the sample hierarchy.

    MAC_ID (Machine Id) -1

    INST_ID (Instance Id)- 2

    DB_ID (DB ID) -3

    TBL_ID (TABLE ID)- 4

    VW_ID (View ID)

    CNST_ID (CONSTRAINT ID)

    IND_ID (INDEX ID)

    Can someone please help me how to load all the values into a table with proper relationship.

    Thanks for your help.

  • What do you mean by "tables"? Could you please provide the structure you want to insert the data?

    Also, your requirement of "load all the values into a table" seems contradictory to your first statement.

    Please clarify.

    Best way to get further assistance would be to provide target table structure together with a sample of expected results. The approach itself in general sounds doable.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Hi Lutz,

    I dont have any specific target table structure. The thing is I need to load all the xml values into a table. Please take a look on the xml file it has 1 to many relationship. So I dont think we can directly load all the values into single table? Spliting every output into various tables. We need to specify the relationship to each and every table. How to achieve this?

    Please let me know if you have any solutions/question if u have?

    Thanks,

    Balaji L

  • Hi Balaji,

    before we go into details a few comments:

    1) You should not post real data as it might get you into trouble for posting stuff that's not within your authority. It is common practise to abstract the problem into "neutral" data.

    2) The "sample data" you provided might get you into serious trouble in terms of SQL vulnerability since everything is included: IP's, database names, table and column specs a.s.o.

    If your systems has any web interface you just opened up the door for BIG trouble. I asked the administrator to remove your attachment to protect your data.

    3) Sample data are supposed to be descriptive and SHORT. The second part is not fulfilled with the unzipped 2.7MB file...

    4) The expected result is supposed to show you the concept of how to work on your solution and not to completely do your job...

    5) If posting in XML forum you should add the SQL version you're using since some of the solutions do not work in older SQL Server version.

    Point #4 brings me to the actual solution (usable from SS2K5 and up):

    --create temp table to hold the temp xml data

    CREATE TABLE #t_xml(

    [col1] [xml] NULL,

    [ID] [int] IDENTITY(1,1) NOT NULL,

    CONSTRAINT [PK_t_xml] PRIMARY KEY CLUSTERED

    (

    [ID] ASC

    ))

    -- insert xml raw data into temp table

    INSERT INTO #t_xml SELECT BulkColumn

    FROM OPENROWSET (BULK 'A:\temp\DIS_CRPSCSMSQHI_20090603_030110.xml', SINGLE_BLOB) TempXML -- change folder to your needs

    -- add index to temp table to improve performance

    CREATE PRIMARY XML INDEX ix_t_xml ON #t_xml

    (

    [col1]

    )

    -- select table information (example)

    ;WITH XMLNAMESPACES (

    DEFAULT 'DBASupport.bankofamerica.com'

    )

    SELECT

    c.value('../@NM', 'varchar(100)') AS TBL_NM,

    c.value('@CID', 'varchar(100)') AS TBL_CID,

    c.value('@NM', 'varchar(100)') AS TBL_NM,

    c.value('@TYP', 'varchar(100)') AS TBL_TYP,

    c.value('@LN', 'varchar(100)') AS TBL_LN,

    c.value('@PRC', 'varchar(100)') AS TBL_PRC,

    c.value('@SCL', 'varchar(100)') AS TBL_SCL,

    c.value('@DFT', 'varchar(100)') AS TBL_DFT,

    c.value('@RUL', 'varchar(100)') AS TBL_RUL

    FROM #t_xml

    CROSS APPLY

    col1.nodes('DRT/MAC/INST/DB/TBL/CL') T(c);

    /* result set

    10704 rows with column definition per table

    */

    Note: Even if it's possible to add more levels to the select clause (e.g. database information), I wouldn't recommend it with a file as large as yours, since it badly hurts performance: to get the data as shown above it took approx. 8sec on my machine but when I included the database name from one level above it went up to 5min.

    With that sample code you should be able to get your issue resolved (e.g. by inserting the values into a separate table).

    If you need more information regarding the use of xml data you might want to look into the series "XML workshop" by Jacob Sebastian. It's really helpful! (search for "XML workshop" on this site)

    To get the tables per database just change the select statement to

    ;WITH XMLNAMESPACES (

    DEFAULT 'DBASupport.bankofamerica.com'

    )

    SELECT

    c.value('../@DBNM', 'varchar(100)') AS DB_DBNM,

    c.value('@NM', 'varchar(100)') AS TBL_NM

    FROM #t_xml

    CROSS APPLY

    col1.nodes('DRT/MAC/INST/DB/TBL') T(c)

    /*result set 502 rows*/;



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • I have removed the attachment.

    Please edit the data to be samples only, and upload in a new post along with an idea of how you want this data to appear in a table(s).

  • Hi Lutz, Thanks a lot for removing the attachment and for your reply.

    I need to develop the SSIS package to import all the XML attributes into tables with parent/child key relationship.

    I got the solution how to handle this scenario. I will let you know if I have any issues.

    Thanks a lot for your reply.

  • Hi Balaji

    I am facing a similar issue......may i know how you split the XML file into tables using SSIS.

    Thanks in advance

    Itz

  • I am facing the same problem.I need to load the data from a xml source.My xml is as follows:-

    <?xml version="1.0"?>

    <Products>

    <Product id="1">

    <ProductName>Pr1</ProductName>

    <Quantity>5</Quantity>

    <ProductType typeid="1">Household</ProductType>

    <Price currency="Rupees">15.00</Price>

    </Product>

    <Product id="2">

    <ProductName>Pr2</ProductName>

    <Quantity>10</Quantity>

    <ProductType typeid="2">Disinfectants</ProductType>

    <Price currency="Yen">25.00</Price>

    </Product>

    <Product id="3">

    <ProductName>Pr3</ProductName>

    <Quantity>15</Quantity>

    <ProductType typeid="3">Wood</ProductType>

    <Price currency="Rupees">56.00</Price>

    </Product>

    <Product id="4">

    <ProductName>Pr4</ProductName>

    <Quantity>52</Quantity>

    <ProductType typeid="1">Household</ProductType>

    <Price currency="Dollars">53.00</Price>

    </Product>

    <Product id="5">

    <ProductName>Pr5</ProductName>

    <Quantity>54</Quantity>

    <ProductType typeid="1">Household</ProductType>

    <Price currency="Dime">5.00</Price>

    </Product>

    <Product id="6">

    <ProductName>Pr6</ProductName>

    <Quantity>9</Quantity>

    <ProductType typeid="2">Disinfectants</ProductType>

    <Price currency="Dollars">5.00</Price>

    </Product>

    <Product id="7">

    <ProductName>Pr7</ProductName>

    <Quantity>95</Quantity>

    <ProductType typeid="3">Wood</ProductType>

    <Price currency="Yen">52.00</Price>

    </Product>

    <Product id="8">

    <ProductName>Pr8</ProductName>

    <Quantity>54</Quantity>

    <ProductType typeid="2">Disinfectants</ProductType>

    <Price currency="Dollar">52.00</Price>

    </Product>

    <Product id="9">

    <ProductName>Pr9</ProductName>

    <Quantity>22</Quantity>

    <ProductType typeid="3">Wood</ProductType>

    <Price currency="Dime">67.00</Price>

    </Product>

    <Product id="10">

    <ProductName>Pr10</ProductName>

    <Quantity>54</Quantity>

    <ProductType typeid="2">Disinfectants</ProductType>

    <Price currency="Dollar">52.00</Price>

    </Product>

    </Products>

    I am getting three tables from teh XML source once placed in SSIS.

    I created a products table having id as teh primary key.

    CREATE TABLE [dbo].[products](

    [id] [tinyint] NOT NULL,

    [ProductName] [nvarchar](100) NULL,

    [Quantity] [tinyint] NULL,

    CONSTRAINT [pk1] PRIMARY KEY CLUSTERED

    (

    [id] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    This is teh script for Priceinfo table

    CREATE TABLE [dbo].[Priceinfo](

    [currency] [nvarchar](255) NULL,

    [text] [decimal](28, 10) NULL,

    [id] [tinyint] NULL

    ) ON [PRIMARY]

    GO

    ALTER TABLE [dbo].[Priceinfo] WITH CHECK ADD CONSTRAINT [fk1] FOREIGN KEY([id])

    REFERENCES [dbo].[products] ([id])

    GO

    ALTER TABLE [dbo].[Priceinfo] CHECK CONSTRAINT [fk1]

    GO

    This is teh script for Productstype table

    CREATE TABLE [dbo].[Producttype](

    [typeid] [tinyint] NULL,

    [text] [nvarchar](255) NULL,

    [id] [tinyint] NULL

    ) ON [PRIMARY]

    GO

    ALTER TABLE [dbo].[Producttype] WITH CHECK ADD CONSTRAINT [fk2] FOREIGN KEY([id])

    REFERENCES [dbo].[products] ([id])

    GO

    ALTER TABLE [dbo].[Producttype] CHECK CONSTRAINT [fk2]

    GO

    The ID in teh priceinfo and products type table are not getting loaded.Please help.

  • Could you please elaborate on how you were able to solve this? I have a scenario with XML similar to the following:

    <jobs>

    <job>

    <jobCode>12345</jobCode>

    <title>.NET Developer</title>

    <locations>

    <location>

    <description>New York</description>

    </location>

    <location>

    <description>Philadelphia</description>

    </location>

    </locations>

    </job>

    <job>

    <jobCode>67890</jobCode>

    <title>Database Administrator</title>

    <locations>

    <location>

    <description>New York</description>

    </location>

    <location>

    <description>Denver</description>

    </location>

    </locations>

    </job>

    </jobs>

    Basically I need to get each job into a parent Job table where jobCode is the primary key, with each corresponding location going into a Location table that contains a jobCode foreign key pointing back to the Job table.

  • You can solve this by shreding the data into the two tables by using the nodes functions. I have created this example:

    --create our tables

    CREATE TABLE Jobs (JobCode INT PRIMARY KEY CLUSTERED, Title VARCHAR(100))

    CREATE TABLE Locations (JobCode INT, Description VARCHAR(100))

    ALTER TABLE Locations ADD CONSTRAINT FK_Locations_Jobs FOREIGN KEY(JobCode) REFERENCES Jobs(Jobcode)

    --load data into variable

    DECLARE @xml XML

    SET @xml = '<jobs>

    <job>

    <jobCode>12345</jobCode>

    <title>.NET Developer</title>

    <locations>

    <location>

    <description>New York</description>

    </location>

    <location>

    <description>Philadelphia</description>

    </location>

    </locations>

    </job>

    <job>

    <jobCode>67890</jobCode>

    <title>Database Administrator</title>

    <locations>

    <location>

    <description>New York</description>

    </location>

    <location>

    <description>Denver</description>

    </location>

    </locations>

    </job>

    </jobs>'

    --insert our data into jobs

    INSERT INTO Jobs

    SELECT t.c.value('(jobCode/text())[1]', 'int')

    , t.c.value('(title/text())[1]', 'varchar(100)')

    FROM @xml.nodes('/jobs/job') T(c)

    --insert our data into locations

    INSERT INTO Locations

    SELECT t.c.value('(jobCode/text())[1]', 'int')

    , u.c.value('(description/text())[1]', 'varchar(100)')

    FROM @xml.nodes('/jobs/job') T(c)

    CROSS APPLY T.c.nodes('locations/location') u(c)

    SELECT * FROM Jobs

    SELECT * FROM Locations

    this create two tables Jobs and locations and there is a foreign key from locations -> jobs. The first insert statement shreds the job data into the Jobs table and the second insert statement shreds the location data into the locations table.

  • Thank you very much for taking the time to reply. I apologize for not being more clear, but I was looking for a solution using SSIS and Data Flow tasks - I have figured out how to do this using a Merge/Join Transformation. If anyone needs to do it this way let me know and I will post what I have come up with.

    However, the information that you have provided will be helpful as well - it will serve as a good reference for using the SQL nodes function. I have not done much development with regard to querying XML data this way, and always find myself having to research it again any time that it comes up.

  • Hi All,

    I am facing similar issue,we have XML file with parent\child relationships and now we want to load that to sqlserver tables.Can someone guide me how to perform this.

    Regards,

    Sudha

  • Thanks a lot for sharing this script

    it helped me a lot !

    Mariusz

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply