How to split Xml columns or Json into records

  • I have to import every days from 1000 to 100000 records.

    Bulk insert create one record with the complete xml file. its great an very quick.

    Can we create directly with a bulk insert one record per order, and one column for the xml ot he order.

    Or how can we do after bulkinsert, to split the xml into into as many lines as there are commands

    to have a simple example we will said that we import orders (100000 /days) . an order has 1 to n lines of items

    here an example of file ( of course in the real life, orders and lines have a lot more of columns)

    <Orders>
    <Order>
    <OrderNumber>100</OrderNumber>
    <Customer>100</Customer>
    <Name>Dupont</Name>
    <Lines>
    <Line>
    <NumLine>1</NumLine>
    <Item>Item1</Item>
    <Quantity>1</Quantity>
    </Line>
    <Line>
    <NumLine>2</NumLine>
    <Item>Item2</Item>
    <Quantity>2</Quantity>
    </Line>
    </Lines>
    </Order>
    <Order>
    <OrderNumber>200</OrderNumber>
    <Customer>200</Customer>
    <Name>Durand</Name>
    <Lines>
    <Line>
    <NumLine>1</NumLine>
    <Item>Item1</Item>
    <Quantity>1</Quantity>
    </Line>
    <Line>
    <NumLine>2</NumLine>
    <Item>Item2</Item>
    <Quantity>2</Quantity>
    </Line>
    </Lines>
    </Order>
    </Orders>

    instead of having one record with the full xml, to take my example, i would like to 2 record, one with the XML of the first order, and a second column with the xml of the second orders.

    a kind of :

    SELECT MyXml.OneOrder.query('OrderNumber').value('.','VARCHAR(50)') as OrderNumber, MyXml.OneOrder
    FROM TestImport TI
    CROSS APPLY XmlCol.nodes('/Orders/Order') AS MyXml(OneOrder)

    I have got a lot job to do on each row before importing ( check data, reject orders, add column, etc)

    As the file is huge, response response time are awfull and the xml manipulation become very tricky.

     

     

  • Thanks for posting your issue and hopefully someone will answer soon.

    This is an automated bump to increase visibility of your question.

  • I don't see why you can't do this.  You already have it started, just need to expand on it to capture each and every element in the XML.  Something along the lines of:

    SELECT
    [MyXml].[OneOrder].[query]('OrderNumber').[value]( '.'
    , 'VARCHAR(50)'
    ) AS [OrderNumber]
    , [MyXml].[OneOrder].[query]('Customer').[value]('.'
    , 'VARCHAR(50)'
    ) AS [Customer]
    , [MyXml].[OneOrder].[query]('Name').[value]('.'
    , 'VARCHAR(50)'
    ) AS [Name]
    , [MyXml2].[OrderLines].[query]('NumLine').[value]( '.'
    , 'VARCHAR(50)'
    ) AS [NumLine]
    , [MyXml2].[OrderLines].[query]('Item').[value]( '.'
    , 'VARCHAR(50)'
    ) AS [Item]
    , [MyXml2].[OrderLines].[query]('Quantity').[value]( '.'
    , 'VARCHAR(50)'
    ) AS [Quantity]
    FROMTestImport AS [TI]
    CROSS APPLY[XmlCol].[nodes]('/Orders/Order') AS [MyXml]([OneOrder])
    CROSS APPLY[MyXML].[OneOrder].[nodes]('./Lines/Line') AS [MyXml2]([OrderLines]);

    I ran SQL Format on it to give it a format similar to what I am more used to and to make it potentially easier to read.  Not sure that it was successful or not.

    Basically, you had the format to get the data from the root, but you needed another CROSS APPLY to get the inner nested stuff.  If you have more stuff nested, you just need another CROSS APPLY and then apply it to the appropriate level.

     

    EDIT - something to note though, SQL Server isn't always the fastest at working with XML data.  It MAY be better (if possible) to convert it to a more easily consumable format prior to importing it into SQL Server.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Thanks for your reply. but it's not exactly what i try to do to see if response times are better.

    When you use bulk import, you have your xml in one column and one row.

    i would like to have as many rows as i have order.

    So in my example, i would like to have 2 row :

    First column : Ordernumber 100

    Second column : the xml of the order : and XML : <Order><OrderNumber>100</OrderNumber><Customer> ....

    Row 2

    Ordernumber 200

    Second column : the xml of the order : and XML : <Order><OrderNumber>200</OrderNumber><Customer> ....

     

     

  • oh sorry.  I misunderstood.  I think that will be a bit more challenging as you are essentially going to need to rip apart the XML.  I think this may fall outside my level of XML inside SLQ.  to break it out it are that you rae going to need to break it out into a nested select or CTE and join it back in, but maybe someone else has better experience with XML and can be more help.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply