XML Workshop XII - Parsing a delimited string

  • jacob sebastian

    SSChampion

    Points: 11812

    Comments posted to this topic are about the item XML Workshop XII - Parsing a delimited string

    .

  • Phil Factor

    SSC-Insane

    Points: 20115

    I really enjoy these XML workshops.

    Of course, the parsing of delimited data is an important issue for data feeds. If data can be converted into XML first, using AWK or GREP or whatever, it then becomes much easier to gulp it into SQL Server. The biggest questions I've had are:

    performance. Whenever I've used this XML technology for data feeds, or passing data between routines, it has been very fast, but others using a very similar system have reported it as being very slow. I'm not at all sure why the difference.

    Resilience Maybe I've been unlucky, but I've had many feeds to deal with that occasionally spit out data that crashes the simplest, and most obvious, data import systems. I reckon that the best systems can isolate corrupt or incorrect data before the bulk insert and allow inspection of it by the DBA

    Any thoughts?

    Best wishes,
    Phil Factor
    Simple Talk

  • jacob sebastian

    SSChampion

    Points: 11812

    I agree with you on the first point. Wherever I used the XML approach in our applications, it just worked well. However, I have read some posts where people complained about performance problems. I am not sure if it is because they are using it in an incorrect manner or something else. Or it could be the volume of data...i am not sure.

    On the next point, do you think a schema could help?

    jacob

    .

  • Phil Factor

    SSC-Insane

    Points: 20115

    A typical problem I've had to face in the past might be that I've got a million or so rows of data from a switch that have to be imported. If they are not imported, then the business runs the risk of leaving a fraud or intrusion undetected. Right in the middle of the million rows is a record or two that is mangled. Rejecting the file isn't an option. The import routine needs to be able to import all the good stuff and leave the bad stuff in a 'limbo' file for manual intervention. Cleaning the data manually before import isn't a good idea either as such things usually are scheduled for the early hours of the morning when the server isn't so busy. Could a schema solve this sort of problem by filtering 'sheep-from-goats' on a record-by-record basis, rather than a document basis?

    I'd love to know what causes slow XML processing but, like you, I'll have to wait until it happens to me! I ran some timings a while back with the various parameter-passing techniques and found XML to be as fast as the 'helper-table'/'number table' technique, which is far faster than any iterative technique.

    Best wishes,
    Phil Factor
    Simple Talk

  • jacob sebastian

    SSChampion

    Points: 11812

    I understand the problem now. A schema will validate an entire document. I do not think that a schema can be used to filter "bad records" and process the "good" ones. I guess the only option available is to query the XML data and retrieve the "good" records (and retrieve the "bad" onese and dumb to a table or XML file for manual review).

    regards

    Jacob

    .

  • Jeff Moden

    SSC Guru

    Points: 997150

    Right in the middle of the million rows is a record or two that is mangled. Rejecting the file isn't an option. The import routine needs to be able to import all the good stuff and leave the bad stuff in a 'limbo' file for manual intervention.

    BCP will do just that very nicely... second in speed only to Bulk Insert which does not have such a capability.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "Dear Lord... I'm a DBA so please give me patience because, if you give me strength, I'm going to need bail money too!"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • LP-181697

    SSC Eights!

    Points: 966

    This method is good. But it adding 6 additional characters to the real data value. It means, suddenly your string will not fit into allocated number of characters. You have to be careful.

  • jacob sebastian

    SSChampion

    Points: 11812

    I dont think this approach is good for Large Chunks of data. It is handy when you have a small piece of delimited string and you want to break it into a relational table quickly.

    .

  • Jack Corbett

    SSC Guru

    Points: 184381

    I thought this was a very interesting article and certainly presented a new way to handle delimited strings without looping. For new applications I would just have the application pass XML as the parameter, but this is certainly a good way to handle existing applications and SSRS 2005 multi-select parameters.

    Jack Corbett
    Consultant - Straight Path Solutions
    Check out these links on how to get faster and more accurate answers:
    Forum Etiquette: How to post data/code on a forum to get the best help
    Need an Answer? Actually, No ... You Need a Question

  • M Gnat

    Newbie

    Points: 5

    I agree it's an interesting "concept", but in the real world; where I have to process MILLIONS of records in a routine, I don't see it working....

  • Jack Corbett

    SSC Guru

    Points: 184381

    I agree it's an interesting "concept", but in the real world; where I have to process MILLIONS of records in a routine, I don't see it working....

    jacob sebastian (12/5/2007)


    I dont think this approach is good for Large Chunks of data. It is handy when you have a small piece of delimited string and you want to break it into a relational table quickly.

    As Jacob says this may not be the best way to handle large sets of data particularly for importing, BCP is designed for that. But for a list being passed as a parameter to a stored procedure this is an excellent idea.

    Jack Corbett
    Consultant - Straight Path Solutions
    Check out these links on how to get faster and more accurate answers:
    Forum Etiquette: How to post data/code on a forum to get the best help
    Need an Answer? Actually, No ... You Need a Question

  • mojo-168709

    SSCommitted

    Points: 1825

    To add the list of recommended readings regarding parsing a delimited string, I find Erland Sommarskog's articles are very helpful:

    http://www.sommarskog.se/arrays-in-sql-2005.html

    http://www.sommarskog.se/arrays-in-sql-2000.html

  • Argneka

    SSC Enthusiast

    Points: 104

    Hi,

    How would I iterate through an XML object to obtain each specific row 1 at a time. I was trying this way, but value requires a literal for both parameters. Not sure what to do?

    ALTER PROCEDURE [dbo].[XMLUpdateTest]

    -- Add the parameters for the stored procedure here

    @XmlList XML,

    @RowCount int,

    AS

    BEGIN

    Declare @Count int

    Declare @ObjectName varchar(50)

    Declare @ColumnName varchar(50)

    Declare @Property varchar(50)

    Declare @Value varchar(50)

    Declare @ParObjectName varchar(50)

    Declare @ParColumnName varchar(50)

    Declare @ParProperty varchar(50)

    Declare @ParValue varchar(50)

    set @Count=0

    while(@Count<@RowCount)

    BEGIN

    set @ParObjectName=@ObjectName+'['+convert(varchar(2), @Count)+']' --@ObjectName[3]

    set @ParColumnName=@ColumnName+'['+convert(varchar(2), @Count)+']' --@ObjectName[3]

    set @ParProperty=@Property+'['+convert(varchar(2), @Count)+']' --@ObjectName[3]

    set @ParValue=@Value+'['+convert(varchar(2), @Count)+']' --@ObjectName[3]

    Select XmlList.Row.value(@ParObjectName,'varchar(50)'),

    XmlList.Row.value(@ParColumnName,'varchar(50)'),

    XmlList.Row.value(@ParProperty,'varchar(50)'),

    XmlList.Row.value(@ParValue,'varchar(50)')

    from @XmlList.nodes('//Rows/Row') as XmlList(Row)

    set @Count=@Count +1

    END

    Thanks for the help

  • jacob sebastian

    SSChampion

    Points: 11812

    You could do this by using a variable. You can refer to an element/attribute as 'sql:variable("@varname")'. Look for sql:variable in books online. I have covered this in "XML Workshop XVII - Writing a LOOP to process XML elements in TSQL". I see it in pending publication list. Hope it will be out in a week or two.

    .

  • Argneka

    SSC Enthusiast

    Points: 104

    Hi,

    Yes, I agree with parsing large amounts of data. In this case I have a datable in my C# app that I want to pass as an XML parameter. That all works fine. The table may have 15-20 rows. The part I am having diffuculty with is how to iterate the XML table that is passed to the Stored Procedure. I guess I need to use the aliased table that I buid from the XML object. Is there a way to iterate through the XML object to obtain a specific element instead of aliasing another table?

    Thanks,

    Alan

Viewing 15 posts - 1 through 15 (of 23 total)

You must be logged in to reply to this topic. Login to reply