parsing a line post import

  • I have a table which has been import from a TXT file. Most lines can be ignored, but when I get a line like this

    ' qwwwwwww 399 0 75124M - 2015-02-06 13:07:29'

    I need to insert into a table as 6 columns ('-' is a value)

    I am not sure how to do this any ideas?

    Many thanks

  • Edward-445599 (8/4/2015)


    I have a table which has been import from a TXT file. Most lines can be ignored, but when I get a line like this

    ' qwwwwwww 399 0 75124M - 2015-02-06 13:07:29'

    I need to insert into a table as 6 columns ('-' is a value)

    I am not sure how to do this any ideas?

    Many thanks

    What makes that row different of the others?

    What are the definitions of both tables?

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • One idea could be to use a function to parse your values. To get you started, check out this solution and see if it offers any inspiration: http://social.technet.microsoft.com/wiki/contents/articles/26937.t-sql-splitting-a-string-into-multiple-columns.aspx

    What logic defines the rows to be ignored? You may want to use a case statement, union or similar logic to ensure the function isn't performed on the wrong data.

    As a very brief and rough example, if you would only want to split strings containing more than 10 characters, using a function as mentioned you can do:

    DECLARE @X VARCHAR(200) = 'qwwwwwww 399 0 75124M - 2015-02-06 13:07:29'

    SELECT CASE WHEN LEN(field) > 10 THEN dbo.UFN_SEPARATES_COLUMNS(@X,1,' ') ELSE field END AS C1,

    CASE WHEN LEN(field) > 10 THEN dbo.UFN_SEPARATES_COLUMNS(@X,2,' ') ELSE '' END AS C2,

    CASE WHEN LEN(field) > 10 THEN dbo.UFN_SEPARATES_COLUMNS(@X,3,' ') ELSE '' END AS C3,

    CASE WHEN LEN(field) > 10 THEN dbo.UFN_SEPARATES_COLUMNS(@X,4,' ') ELSE '' END AS C4,

    CASE WHEN LEN(field) > 10 THEN dbo.UFN_SEPARATES_COLUMNS(@X,5,' ') ELSE '' END AS C5,

    CASE WHEN LEN(field) > 10 THEN CAST(dbo.UFN_SEPARATES_COLUMNS(@X,6,' ') + ' ' + dbo.UFN_SEPARATES_COLUMNS(@X,7,' ') AS DATETIME) ELSE '' END AS C6

    Have a play around and see if this would suit your requirements.

  • Edward-445599 (8/4/2015)


    I have a table which has been import from a TXT file. Most lines can be ignored, but when I get a line like this

    ' qwwwwwww 399 0 75124M - 2015-02-06 13:07:29'

    I need to insert into a table as 6 columns ('-' is a value)

    I am not sure how to do this any ideas?

    Many thanks

    What do the other rows look like? In other words, what makes these type of rows look different from any of the other rows?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Use the DelimitedSplit8K function (found on this site, at the "Splitting Delimited Strings" link in my signature) to split this string on the spaces.

    Then use this query to get the various columns:

    DECLARE @X VARCHAR(200) = 'qwwwwwww 399 0 75124M - 2015-02-06 13:07:29'

    SELECT MAX(CASE WHEN ItemNumber = 1 THEN Item ELSE NULL END) AS Col1,

    MAX(CASE WHEN ItemNumber = 2 THEN Item ELSE NULL END) AS Col2,

    MAX(CASE WHEN ItemNumber = 3 THEN Item ELSE NULL END) AS Col3,

    MAX(CASE WHEN ItemNumber = 4 THEN Item ELSE NULL END) AS Col4,

    MAX(CASE WHEN ItemNumber = 5 THEN Item ELSE NULL END) AS Col5,

    MAX(CASE WHEN ItemNumber = 6 THEN RIGHT(@X, 19) ELSE NULL END) AS Col6

    FROM dbo.DelimitedSplit8K(@X, ' ')

    Edit: I'm assuming that you already know what makes this row different from the others, so this just splits the string into the columns.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • XYoung (8/5/2015)


    One idea could be to use a function to parse your values. To get you started, check out this solution and see if it offers any inspiration: http://social.technet.microsoft.com/wiki/contents/articles/26937.t-sql-splitting-a-string-into-multiple-columns.aspx

    What logic defines the rows to be ignored? You may want to use a case statement, union or similar logic to ensure the function isn't performed on the wrong data.

    As a very brief and rough example, if you would only want to split strings containing more than 10 characters, using a function as mentioned you can do:

    DECLARE @X VARCHAR(200) = 'qwwwwwww 399 0 75124M - 2015-02-06 13:07:29'

    SELECT CASE WHEN LEN(field) > 10 THEN dbo.UFN_SEPARATES_COLUMNS(@X,1,' ') ELSE field END AS C1,

    CASE WHEN LEN(field) > 10 THEN dbo.UFN_SEPARATES_COLUMNS(@X,2,' ') ELSE '' END AS C2,

    CASE WHEN LEN(field) > 10 THEN dbo.UFN_SEPARATES_COLUMNS(@X,3,' ') ELSE '' END AS C3,

    CASE WHEN LEN(field) > 10 THEN dbo.UFN_SEPARATES_COLUMNS(@X,4,' ') ELSE '' END AS C4,

    CASE WHEN LEN(field) > 10 THEN dbo.UFN_SEPARATES_COLUMNS(@X,5,' ') ELSE '' END AS C5,

    CASE WHEN LEN(field) > 10 THEN CAST(dbo.UFN_SEPARATES_COLUMNS(@X,6,' ') + ' ' + dbo.UFN_SEPARATES_COLUMNS(@X,7,' ') AS DATETIME) ELSE '' END AS C6

    Have a play around and see if this would suit your requirements.

    Just a performance advisory on that. You're calling a Scalar function with a WHILE loop in it 7 times per row and that's really going to slow things down. If the max character length of a VARCHAR row is 8000 or less, consider using the iTVF in the following article, which also returns an integer for the position of each element that has been split out.

    http://www.sqlservercentral.com/articles/Tally+Table/72993/

    If the max character length of each row usually exceeds VARCHAR(8000), don't change the function in that article to try to accommodate because it will instantly make the function run at least twice as slow because that's the nature of trying to do string manipulations on MAX datatypes not to mention that they also don't like to be joined to.

    To wet your appetite for the article, here's a graph with one set of the test results. The black line at the bottom of the chart is the performance curve for the function I'm talking about.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thankyou Jeff 🙂

  • Thank you all. (the other rows are random text prefixed by # or data description prefixed by @ so I can sort them easy)

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply