SQL 2005 - Shredding XML into 3 related tables?

  • I have been tasked with transferring transaction data from branch sites to the central site for reporting purposes however due to my inexperience of data transfer between servers I'm struggling to determine where to begin.

    1. Each branch site is running MSDE 2000 with the latest service pack installed.

    2. Head Office site is running SQL Server 2005 Standard

    At first thoughts for data transfer I immediately looked into BCP (as I'm from a SQL 2000 background) however I soon came to realise that due to the relationship of my tables this "flat file" approach would be difficult if not impossible.

    Basically we have 3 tables: Transactions, TransactionDetails and TransactionPayments. TransactionDetails and TransactionPayments are n-to-one dependant on Tranactions based on a TransactionID in each table.

    It's not a simple case of replication as various details must be changed mid-transit (Accounts Id's etc.) and therefore it was decided that an export method be created on the branch databases which was to be transferred to Head Office via the Internet and then re-processed into the Head Office database.

    I have managed to export the data from each branch using FOR XML EXPLICIT in SQL Server (MSDE) 2000 in the following format:

    <Transactions>

    <Transaction AccountID="1">

    <TransactionDetail Description="Paypoint" Quantity="1.00" Price="5.00"/>

    <Payment PaymentValue="2.00" ChangeValue="0.00"/>

    </Transaction>

    </Transactions>

    I've been looking into OPENXML and the XQUERY support of SQL 2005 in the hope that I may be able to use FLOWR to loop through each Transaction adding the header into the Transactions table and using the returned inserted identity (TransactionID) then loop each TransactionDetail and Payment of the XML data adding to these tables as required.

    All sounds good in theory however I'm not sure whether or not this is possible using t-sql directly or if I would have to create a W32 program to parse the XML and insert the data into SQL? hopefully you guys can point me in the right direction to start as falling back to using a seperate w32 program would no doubt slow the whole process down.

    Any help greatly appreciated.

    Thanks in advance

    Ian Ashworth

  • Hello,

    It sounds like a Data Warehouse application that you are working on? If so, then wouldn’t standard ETL methods work e.g. load the Transaction File in first in to a “Cleansing” table with an Identity column, then load the other two files in to their cleansing tables and use Joins to pick up the corresponding Transaction’s Identity value.

    Or may be there is something that prevents this?

    I assume you don’t have reliable, permanent communications links to the Branch Sites?

    Regards,

    John Marsh

    www.sql.lu
    SQL Server Luxembourg User Group

  • I guess in a way it is akin to a data warehouse however there is no CUBE/Analysis data being generated for reporting only purposes. This database is the same design at the branch as at head office except for the difference in database engines (MSDE2K vs. SQL2K5) but not ALL data is to be synchronised.

    Not sure about the holding/cleansing tables - maybe you could provide an example?

    The way I need it to work is that I somehow have a For...Next loop for each Transaction in the XML data which would allow me to add the Transaction header record and return the IDENTITY for use when then looping the child items (Details/Payments) to insert these rows.

    I know that this is likely possible using XQuery and some combination of COUNT() and a loop selecting only the relevant Transaction based on the sequence within the file but was under the impression this would be tediuously slow.

    I looked at the "for $t in /Transactions/Transaction return $t" but that doesn't allow me to then add an individual header for each and traverse the child element to maintain the relationship.

  • Hello again,

    Sorry, I can’t help you with your XML questions.

    I was just wondering why not to use normal set based SQL methods, or may be I am missing something? If you have good connectivity then you can use Linked Servers and query directly from the HO Server. If connectivity is not good then import flat files into a Cleansing Table from an SSIS package. As mentioned, if you import the Transaction records first, then afterwards you can use Joins to match the Transaction ID into the child records.

    A “Cleansing Table” is just an intermediately table where you can perform validation and transformation operations before the incoming data goes to its final resting place.

    Regards,

    John Marsh

    www.sql.lu
    SQL Server Luxembourg User Group

  • John, thanks for the prompt replies and assistance so far.

    Just thinking a little more about this "cleansing" table.

    As the TransactionID as provided by the Branch would otherwise be lost when importing into the Transactions table at Head Office and a new IDENTITY is produced how would you suggest I setup these staging tables?

    I was thinking a new "Imported_Transactions" table which would hold all the fields within the main Transactions table AND a new field for "NewTransactionID". As the new TransactionID is based on the IDENTITY as inserted into the main Transactions table I was thinking of inserting into this holding table and including a trigger on this table which would insert the new records into the main Transactions table and store the current branch TransactionID aswell as the newly returned H/O generated ID.

    I could then use that table to create TransactionDetails and Payments based on the NewTransactionID but joining the details on the TransactionID as generated at the branch.

    Again... if anybody else knows a better way to do this using xml without a temporary holding/cleansing table any information is greatly appreciated.

  • Hello Ian,

    That’s the kind of thing that I was thinking about.

    Personally, rather than a Trigger, I would use a separate step in an SSIS package to move the records to the final table, but that is probably only because it is what I am used to.

    Regards,

    John

    www.sql.lu
    SQL Server Luxembourg User Group

  • If you just want to copy the data (or selected data) and keep the identity values as is, there is a flag you can set so that you can enter a specific value into the identity column:

    SET IDENTITY_INSERT [ database_name . [ schema_name ] . ] table { ON | OFF }

  • Hi there,

    maybe this can get you started:

    -- Testdata based on your posted example did not work on the forum, it needs inserting here

    declare @xml xml

    set @xml='your xml data here'

    ;

    -- Shred XML data to create Transactions

    SELECTColumn1.value('@AccountID','int') AccountId

    FROM @xml.nodes('/Transactions/Transaction') as Table1(Column1)

    ;

    --Shred XML data to create TransactionDetails

    SELECTColumn1.value('(../@AccountID)','int') AccountId,

    Column1.value('(@Description)','varchar (255)')DetailDescription,

    Column1.value('@Quantity','varchar (255)')Quantity,

    Column1.value('@Price','varchar (255)')Price

    FROM @xml.nodes('/Transactions/Transaction/TransactionDetail') as Table1(Column1)

    ;

    --Shred XML data to create TransactionPayments

    SELECTColumn1.value('(../@AccountID)','int')AccountId,

    Column1.value('(@PaymentValue)','varchar (255)')PaymentValue,

    Column1.value('(@ChangeValue)','varchar (255)')ChangeValue

    FROM @xml.nodes('/Transactions/Transaction/Payment') as Table1(Column1)

    !!!!! WARNING !!!!!

    The forum software removed the XML stuff from my post. Please set @xml using the xml data that you supplied in the original post.

    !!!! WARNING !!!!!

    This would provide you with three separate collections of information from your XML file. I would follow the suggestions of creating staging tables to load the information and then sort out the relationships from there.

    Please be aware that if you are throwing all your transactions data into one big XML file it can be quite slow to process the data.

    Also, when working with XML data, the XQuery functions are case-sensitive. If you have a node with the name "Transactions", you will get an error when trying to query "transactions".

    HTH

    Best regards

    GermanDBA

    Regards,

    WilliamD

  • Ian Ashworth (2/16/2009)


    At first thoughts for data transfer I immediately looked into BCP (as I'm from a SQL 2000 background) however I soon came to realise that due to the relationship of my tables this "flat file" approach would be difficult if not impossible.

    Basically we have 3 tables: Transactions, TransactionDetails and TransactionPayments. TransactionDetails and TransactionPayments are n-to-one dependant on Tranactions based on a TransactionID in each table.

    Hi Ian,

    I just posted a possible solution using XML, and then realised that you had mentioned and ruled out flat files.

    Can I ask why?

    You would basically create three files, one for the transactions, one for the details and one for the payments. You could bcp these out of your branch server and then into the central server. You can then work with the bcp target tables and transform/join etc. until you get the desired result. It should be much quicker than using the XML method.

    Just my opinion.

    GermanDBA

    Regards,

    WilliamD

  • The "flat file" approach wasn't ruled out as such I was just under the impression it might be better to export the data as XML to maintain the parent/child hierarchy between the Transaction (header) and each detail/payment record.

    I'm going to experiment a little with both xml and flat file to determine how best to proceed.

  • Hi Ian,

    no problem.

    Do let us know how you get on.

    GermanDBA

    Regards,

    WilliamD

  • Hi Ian,

    I have to agree with GermanDBA regarding the speed of the import.

    Some time back I had to import large XML files into a 2005 DB with T-SQL, and although it can do it the speed is not impressive.

    Someone out there may be able to confirm this, but I have the impression that once you started using 'CROSS APPLY' to reference various nodes SQL appeared to just parse the XML as if it were a gigantic string; shuffling back and forth for the data it required. The processing speed was terrible.

    I found better results by extracting one or two nodes into temporary tables and using those to extract further - replacing the 'CROSS APPLY' with the temporary tables created in the previous SELECT.

    Given the option I would always use flat files and bcp; importing the data as quickly as possible to manipulate as I wish.

  • Hi Steve,

    this is something that I have seen at my new workplace.

    It is not just cross apply that can cause the problems, joining directly to XQuery or OPEN XML queries seems to do it too.

    I'm not too sure why, but guess it has to do with the query optimizer not having a clue as what is inside the XML variable until parsing time and can therefore not evaluate possible optimization techniques.

    From my experience I have seen that materialized data generally outperforms work with variable data like XML. That is after all, what SQL Server is designed to work with.

    Regards

    GermanDBA

    Regards,

    WilliamD

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply