ETL for Free-Form Data

  • Comments posted to this topic are about the item ETL for Free-Form Data

  • I am not impressed with the article. I am not clear what the author wants to say bythese lines in step 1:

    ' The only thing that should change is the data field value itself. For example, I created this procedure to get GPS data from web-based truck fleet reports into a Data Warehouse staging table. '

    I may not be clear but someone can explain me on this.

    Regards

    Anirban

  • I don’t like to use third party components if I don’t have to. I think a simpler solution would be to do a page scrape and then parse it with regular expressions. No need for XML or components.

  • When I started reading this, even the introduction sounded like a sales pitch. This isn't an article, it's an advertisment. I suppose it will be useful to some that can't figure out how to do this with the native components of SQL Server.

    I also agree about the 3rd party component thing that was previously mentioned... I avoid them.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • This could just as easily (perhaps more easily) be done by pumping the raw XML into a staging table column and using the native SQL Server XML data type's .nodes() and .value() methods.

  • See? That's what I mean... don't need 3rd party products for this type of stuff.

    Thanks for the info, Mike...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Parsing is, in my opinion, client-side responsibility. Doing these kind of things is SQL Server is not impossible neither difficult but I tend to pass that to specialized languages that are better equiped for the job.

    Just my $0.02


    * Noel

  • noeld (3/20/2008)


    Parsing is, in my opinion, client-side responsibility. Doing these kind of things is SQL Server is not impossible neither difficult but I tend to pass that to specialized languages that are better equiped for the job.

    Just my $0.02

    System.Xml namespace includes several objects with methods for this type of XML parsing and manipulation as well if you wanted to do it on the client-side.

  • When I suggested one of the site in some post on SSC. I got an email that

    "We don't link/promote any sites on SSC." So I feel sorry that I did something wrong. But now SSC is forcing to read the PURE advertisement for Third party tool. I don't understand what was the purpose of this article?

  • I'm very new to SQL. I am trying to get a better understanding of data mining in free form text, as I understand it using regular expressions. could you point me in the direction of some information regarding searching through free form text (articles, conversations, etc) to pull out information such as tone of the article (negative, positive, angry, sad, etc) and basic subject matter?

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply