XML Inserts

  • I am doing some investigations to find out if inserts happen in a more efficient way using XML.

    It needs to be seen as to how the XML can be supplied for this type of operation. Can it work based on a file on disk? Can it work with an object that is present in memory?

    • Is it possible to insert multiple rows at the same time?
    • Is it possible to insert rows into multiple tables by supplying XML?
    • It needs to be seen if there can be a control over the granularity of the transactions involved in this type of data insertion.

    The structure of the XML would be

    <root>

      <Customer>

       <Cust_id>101</Cust_id>

       <Cust_fname>"George"</Custf_name>

       <Cust_lname>"Spencer"</Cust_lname>

       <Cust_address>"Some Address"</Cust_address>

       <Cust_dob>

          <day>2</day>

          <month>10</month>

          <year>1963</year>

       </Cust_dob>

       <Cust_Spends>

          <cust_currency>"USD"</cust_currency>

          <cust_spendamount>1000</cust_spendamount>

       </Cust_Spends>

      <Customer>

    </root>

    The table structure is

    Cust_id

    Cust_fname

    Cust_lname

    Cust_address

    Cust_dob

    cust_currency

    cust_spendamount

    I know that the XML is not in the format which it ideally should be in for me to use open XML

    EG:

    INSERT CUSTOMERS

    SELECT *

    FROM OPENXML(@hDoc, N'/ROOT/Customer',2)

         WITH CUSTOMERS

    The Problem is that the app is designed to return that XML in the XML format specified, and I need to insert it in the table. Additionally, I need to know about the transaction support provided, if any, and if I have 10000 customer records along with some records which have to be inserted into another related table, can I do it as a batch job.

    Any help would be appreciated.


    What I hear I forget, what I see I remember, what I do I understand

  • What you just selected is XML

       <Cust_id>101</Cust_id>

       <Cust_fname>"George"</Custf_name>

       <Cust_lname>"Spencer"</Cust_lname>

       <Cust_address>"Some Address"</Cust_address>

       <Cust_dob>

          <day>2</day>

          <month>10</month>

          <year>1963</year>

       </Cust_dob>

       <Cust_Spends>

          <cust_currency>"USD"</cust_currency>

          <cust_spendamount>1000</cust_spendamount>

       </Cust_Spends>

    You will have to create a staging table, inserting CUSTID together with all attributes and afterwards, selecting from the staging table and insert into customers table.

    stage table

    -----------------------------

    custid     int

    attributename   varchar

    attributevalue    varchar

    like

    101 fname george

    101 lname spencer

    101 address some address

    101 dobday 2

    101 dobmonth 10

    101 dobyear 1963

    101 currency usd

    101 spendamount 1000

    and then select from insert table into customer table

    insert customers

    select custid,

           max(case when attributename = 'fname' then fname else null end) fname,

           max(case when attributename = 'lname' then fname else null end) lname,

           max(case when attributename = 'address' then fname else null end) address

    from stagetable

    group by custid


    N 56°04'39.16"
    E 12°55'05.25"

  • Staging table? Peter - do you know we can just do this?

    --data

    declare @doc varchar(8000)

    set @doc = '

    <root>

      <Customer>

       <Cust_id>101</Cust_id>

       <Cust_fname>"George"</Cust_fname>

       <Cust_lname>"Spencer"</Cust_lname>

       <Cust_address>"Some Address"</Cust_address>

       <Cust_dob>

          <day>2</day>

          <month>10</month>

          <year>1963</year>

       </Cust_dob>

       <Cust_Spends>

          <cust_currency>"USD"</cust_currency>

          <cust_spendamount>1000</cust_spendamount>

       </Cust_Spends>

      </Customer>

      <Customer>

       <Cust_id>102</Cust_id>

       <Cust_fname>Fred</Cust_fname>

       <Cust_lname>Bloggs</Cust_lname>

       <Cust_address>Somewhere</Cust_address>

       <Cust_dob>

          <day>3</day>

          <month>12</month>

          <year>1974</year>

       </Cust_dob>

       <Cust_Spends>

          <cust_currency>GBP</cust_currency>

          <cust_spendamount>12345</cust_spendamount>

       </Cust_Spends>

      </Customer>

    </root>'

    --calculation

    declare @CUSTOMERS table (

    Cust_id int,

    Cust_fname varchar(10),

    Cust_lname varchar(10),

    Cust_address varchar(30),

    Cust_dob datetime,

    cust_currency varchar(5),

    cust_spendamount int)

    declare @hDoc int

    EXEC sp_xml_preparedocument @hdoc OUTPUT, @doc

    INSERT @CUSTOMERS

    select Cust_id, Cust_fname, Cust_lname, Cust_address,

      dateadd(day, Cust_dob_day - 1, dateadd(month, (Cust_dob_year - 1900) * 12 + Cust_dob_month - 1, 0)),

      cust_currency, cust_spendamount

    from (

      SELECT * FROM OPENXML(@hDoc, N'/root/Customer', 2)

      WITH (Cust_id           int           'Cust_id',

            Cust_fname        varchar(10)   'Cust_fname',

            Cust_lname        varchar(10)   'Cust_lname',

            Cust_address      varchar(30)   'Cust_address',

            Cust_dob_day      int           'Cust_dob/day',

            Cust_dob_month    int           'Cust_dob/month',

            Cust_dob_year     int           'Cust_dob/year',

            cust_currency     varchar(5)    'Cust_Spends/cust_currency',

            cust_spendamount  int           'Cust_Spends/cust_spendamount')) a

    EXEC sp_xml_removedocument @hdoc

    select * from @CUSTOMERS

    /*results

    Cust_id     Cust_fname Cust_lname Cust_address                   Cust_dob                                               cust_currency cust_spendamount

    ----------- ---------- ---------- ------------------------------ ------------------------------------------------------ ------------- ----------------

    101         "George"   "Spencer"  "Some Address"                 1963-10-02 00:00:00.000                                "USD"         1000

    102         Fred       Bloggs     Somewhere                      1974-12-03 00:00:00.000                                GBP           12345

    */

    Ryan Randall

    Solutions are easy. Understanding the problem, now, that's the hard part.

  • nileshsane - re a few of your questions:

  • Is it possible to insert multiple rows at the same time?

    Yes - see above example.

  • Is it possible to insert rows into multiple tables by supplying XML?

    Aside from multiple openxml statements, I guess you could insert into a view with some instead of triggers to process and route it to your tables as you require.

  • You may well be best using a staging table and processing from there (as Peter suggested) - I just wanted to point out that you can do it without.

    Ryan Randall

    Solutions are easy. Understanding the problem, now, that's the hard part.

  • Thanks for responding guys, I am going to try out the solutions which you'll posted and get back here with the results.


    What I hear I forget, what I see I remember, what I do I understand

  • Yes. I did not understand he was knowledgeable about XML otherwise, since he wrote only for customer node.

    And I also know that the technique you are using does not work when having for example two addresses. You end up with just one if writing as above. You must "start" at the address depth and dig yourself down to root level to get all addresses, for example.


    N 56°04'39.16"
    E 12°55'05.25"

  • Peter - Could you please show me an example of what can't be done? I'd like to understand.

    Here's one example of dealing with multiple addresses, but I'm guessing this isn't what you had in mind.

    --data

    declare @doc varchar(8000)

    set @doc = '

    <root>

      <Customer>

       <Cust_id>101</Cust_id>

       <Cust_fname>"George"</Cust_fname>

       <Cust_lname>"Spencer"</Cust_lname>

       <Cust_address>"Some Address"</Cust_address>

       <Cust_address>"Some other Address"</Cust_address>

       <Cust_dob>

          <day>2</day>

          <month>10</month>

          <year>1963</year>

       </Cust_dob>

       <Cust_Spends>

          <cust_currency>"USD"</cust_currency>

          <cust_spendamount>1000</cust_spendamount>

       </Cust_Spends>

      </Customer>

      <Customer>

       <Cust_id>102</Cust_id>

       <Cust_fname>Fred</Cust_fname>

       <Cust_lname>Bloggs</Cust_lname>

       <Cust_address>Somewhere</Cust_address>

       <Cust_dob>

          <day>3</day>

          <month>12</month>

          <year>1974</year>

       </Cust_dob>

       <Cust_Spends>

          <cust_currency>GBP</cust_currency>

          <cust_spendamount>12345</cust_spendamount>

       </Cust_Spends>

      </Customer>

    </root>'

    --calculation

    declare @CUSTOMERS table (

    Cust_id int,

    Cust_fname varchar(10),

    Cust_lname varchar(10),

    Cust_address varchar(30),

    Cust_address2 varchar(30),

    Cust_dob datetime,

    cust_currency varchar(5),

    cust_spendamount int)

    declare @hDoc int

    EXEC sp_xml_preparedocument @hdoc OUTPUT, @doc

    INSERT @CUSTOMERS

    select Cust_id, Cust_fname, Cust_lname, Cust_address, Cust_address2,

      dateadd(day, Cust_dob_day - 1, dateadd(month, (Cust_dob_year - 1900) * 12 + Cust_dob_month - 1, 0)),

      cust_currency, cust_spendamount

    from (

      SELECT * FROM OPENXML(@hDoc, N'/root/Customer', 2)

      WITH (Cust_id           int           'Cust_id',

            Cust_fname        varchar(10)   'Cust_fname',

            Cust_lname        varchar(10)   'Cust_lname',

            Cust_address      varchar(30)   'Cust_address',

            Cust_address2     varchar(30)   'Cust_address[2]',

            Cust_dob_day      int           'Cust_dob/day',

            Cust_dob_month    int           'Cust_dob/month',

            Cust_dob_year     int           'Cust_dob/year',

            cust_currency     varchar(5)    'Cust_Spends/cust_currency',

            cust_spendamount  int           'Cust_Spends/cust_spendamount')) a

    EXEC sp_xml_removedocument @hdoc

    select * from @CUSTOMERS

    /*results

    Cust_id     Cust_fname Cust_lname Cust_address                   Cust_address2                  Cust_dob                                               cust_currency cust_spendamount

    ----------- ---------- ---------- ------------------------------ ------------------------------ ------------------------------------------------------ ------------- ----------------

    101         "George"   "Spencer"  "Some Address"                 "Some other Address"           1963-10-02 00:00:00.000                                "USD"         1000

    102         Fred       Bloggs     Somewhere                      NULL                           1974-12-03 00:00:00.000                                GBP           12345

    */ 

    Ryan Randall

    Solutions are easy. Understanding the problem, now, that's the hard part.

  • You have indexed the addresses right now with [2]. What if there are more addresses? Or a better example, several orders for a customer.

    This is because you set "root" to be OPENXML(@hDoc, N'/root/Customer', 2)

    If you set root to be N'/root/Customer/Cust_address'

    and use

      WITH (Cust_id           int           '..\Cust_id',

            Cust_fname        varchar(10)   '..\Cust_fname',

    for example, you get all addresses without having to index them. it works much like a join with "no-expanding-join behaviour".

    If one cust_id and three addresses, beginning with cust_id and drilling down to addresses return just one row because that is what cust_id is.

    But, starting with addresses and "anti-drilling" up to cust_id gives three rows, since there are three addresses and just one cust_id.

    In original posting, cust_id versus cust_spends is a better example if there were more than one cust_spend.


    N 56°04'39.16"
    E 12°55'05.25"

  • Thanks Peter. I think we both understand what can be done here and the implications. I just wanted to check there weren't any gaps in my (or your) knowledge

    I'm still not sure I like the structure of that stage table though. I don't understand why you would want one with that structure.

     

    Ryan Randall

    Solutions are easy. Understanding the problem, now, that's the hard part.

  • Four years of practice

    We (insurance broker company) imported around 400 XML files every day, each of them between 2 and 10 mb.

    The XML files could have as much as 12 levels of tags and everything, I really mean EVERYTHING, had to be searchable and filterable. Below are tables somewhat simplified for proof of concept only.

    I created a lookup table looking like this

    Lookup table

    ---------------

    FieldID INT

    FieldName VARCHAR

    FieldDataType VARCHAR

    CustomersInsurances table

    ----------------------------

    CustInsID INT IDENTITY

    CustomerID INT

    InsuranceNumber VARCHAR

    And I staged everything in a table

    Stage table

    -----------------

    InsuranceNumber VARCHAR

    FieldID INT

    FieldValue VARCHAR

    Because the files we got from the insurance companies did not have SSN or any other id for customers than name, address and telephone number due to legal implications. But they had insurance numbers. CustomersInsurances table were filled from accounting department with their systems when we got our commission. This information is always received two weeks later than the XML-file. The XML-file arrived within 36 hours after customer acceptence and signing but commission arrived about two weeks after that.

    This way I could import the XML-files anyway and all data were stored and ready to be used and bound to customer whenever accounting department inserted the binding information in the CustomersInsurances table.

    This way we could bind everything together! I built a search engine with GUI that allowed the end-user to filter for every field in the XML file, no matter how many value instances for the field, and also allowing any field to be output, columnwise. You could search for families having at least two children, one under 10 and the other over 15. Also they must have had their insurance for a minimum of 2 years and not having bought another insurance the last 45 days. Output should be Office who sold first and last insurance to the family, accumulated commission for insurance and gender of sales agent.


    N 56°04'39.16"
    E 12°55'05.25"

  • Nice to get an insight into someone's life. Many thanks!

    Ryan Randall

    Solutions are easy. Understanding the problem, now, that's the hard part.

  • You're welcome.

    They are still using my design for the XML file import, SEVEN years after implementation... That's solid!

    Do I need to mention that the schema for XML file has been rewritten several times by the insurance companies?

    But my import mechanism doesn't care. If new attribute found, it just add the new one to the lookup table and continue to parse file. It does a near recursive iteration through all file so it finds all attributes no matter what they are called and where they are. InsuranceNumber is practically the only attribute that hasn't changed place. It is still a 'root/insurance' object.

    I love self-learning systems...


    N 56°04'39.16"
    E 12°55'05.25"

  • Maybe they just don't know any better!

     

    Ryan Randall

    Solutions are easy. Understanding the problem, now, that's the hard part.

  • Hey! Be nice


    N 56°04'39.16"
    E 12°55'05.25"

  • Alrighty. I have tried the code with the different formats which we have on here, and not surprisingly it was worked like silk.

    My concern now is, the size of @doc varchar(8000). I have XML's which would exceed 30K, and I believe the only option I have is text, but text datatype is not supported for local variables.

    An option which I am considering is passing the entire XML to OPENXML as it is without assigning it to @doc. Dont know how this translates to good practise, but it works for me.

    Thanks for all the inputs guys.


    What I hear I forget, what I see I remember, what I do I understand

  • Viewing 15 posts - 1 through 15 (of 17 total)

    You must be logged in to reply to this topic. Login to reply