Parse txt file

  • Bruin wrote:

    Is there some magic that pulls in the file?

    Thx.

    Not really.... just do a BULK INSERT from the file.  I haven't had the time to post it but you should have no issue.  Make a temporary target table should have a RowNum columns as and identity and a 500 character "DataFromFile" column.  Then just change the code to read from that Temp table instead of the VALUES() clause.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I put the data in a Table now via bulk Insert, but the query now is failing...

    Msg 537, Level 16, State 3, Line 5

    Invalid length parameter passed to the LEFT or SUBSTRING function.

    It processes about 3 records then errors.

      WITH 
    cteGroup AS
    (--==== First, we have to group the lines from the file. (THIS IS THE KEY TO EVERYTHING!)
    SELECT df.PSFOrder_Id
    ,DataGroup = COUNT(CASE
    WHEN substring(df.PSFile,1,2) = '/*'
    THEN 1
    END) OVER (ORDER BY df.PSFOrder_Id)
    ,DataFromFile = df.PSFile
    FROM dbastuff.dbo.PSFileOrder df
    )
    ,cteLeftTag AS
    (
    SELECT DataGroup
    ,Tag1 = SUBSTRING(DataFromFile,1,CHARINDEX(':',DataFromFile)-1)
    ,Data1 = SUBSTRING(DataFromFile,CHARINDEX(':',DataFromFile)+2,500)
    FROM cteGroup
    WHERE DataFromFile > ''
    AND LEFT(DataFromFile,2)<>'/*'
    )
    SELECT DataGroup
    ,insert_job = RTRIM(MAX(CASE WHEN Tag1 = 'insert_job'
    THEN SUBSTRING(Data1,1,ISNULL(ca.JtPos-1,500))
    ELSE '' END))
    ,job_type = RTRIM(MAX(CASE WHEN Tag1 = 'insert_job' AND ca.JtPos > 0
    THEN SUBSTRING(Data1,ca.JtPos+10,500)
    ELSE '' END))
    ,job_type = RTRIM(MAX(CASE WHEN Tag1 = 'job_type' THEN DATA1 ELSE '' END))
    ,days_of_week = RTRIM(MAX(CASE WHEN Tag1 = 'days_of_week' THEN DATA1 ELSE '' END))
    ,start_times = RTRIM(MAX(CASE WHEN Tag1 = 'start_times' THEN DATA1 ELSE '' END))
    ,run_calendar = RTRIM(MAX(CASE WHEN Tag1 = 'run_calendar' THEN DATA1 ELSE '' END))
    ,start_mins = RTRIM(MAX(CASE WHEN Tag1 = 'start_mins' THEN DATA1 ELSE '' END))
    ,machine = RTRIM(MAX(CASE WHEN Tag1 = 'machine' THEN DATA1 ELSE '' END))
    ,watch_file = RTRIM(MAX(CASE WHEN Tag1 = 'watch_file' THEN DATA1 ELSE '' END))
    ,box_name = RTRIM(MAX(CASE WHEN Tag1 = 'box_name' THEN DATA1 ELSE '' END))
    FROM cteLeftTag
    CROSS APPLY (VALUES(NULLIF(CHARINDEX('job_type',Data1),0)))ca(JtPos)
    GROUP BY DataGroup
    ;
  • Bruin wrote:

    I put the data in a Table now via bulk Insert, but the query now is failing...

    Msg 537, Level 16, State 3, Line 5 Invalid length parameter passed to the LEFT or SUBSTRING function.

    It processes about 3 records then errors.

    There's pretty much no way that I can troubleshoot that from here except to tell you that there's something in the file that you loaded that doesn't look like what you said it would or you've made an error in how you loaded the data.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I just did a Bulk Insert, any way to try and debug?

     

    Thx.

  • If you were paying any of us $250 an hour, would you waste our time in such a fashion?  😉

    We need to see the stuff that supports what you say just failed.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • When I run this section it groups the data as expected...

    SELECT df.PSFOrder_Id

    ,DataGroup = COUNT(CASE

    WHEN substring(df.PSFile,1,2) = '/*'

    THEN 1

    END) OVER (ORDER BY df.PSFOrder_Id)

    ,DataFromFile = df.PSFile

    FROM dbastuff.dbo.PSFileOrder df

    <><><>  It seems to be failing <><><>

    ,cteLeftTag AS

    (

    SELECT DataGroup

    ,Tag1 = SUBSTRING(DataFromFile,1,CHARINDEX(':',DataFromFile)-1)

    ,Data1 = SUBSTRING(DataFromFile,CHARINDEX(':',DataFromFile)+2,500)

    FROM cteGroup

    WHERE DataFromFile > ''

    AND LEFT(DataFromFile,2)<>'/*'

    )

    SELECT DataGroup

    ,insert_job = RTRIM(MAX(CASE WHEN Tag1 = 'insert_job'

    THEN SUBSTRING(Data1,1,ISNULL(ca.JtPos-1,500))

    ELSE '' END))

    ,job_type = RTRIM(MAX(CASE WHEN Tag1 = 'insert_job' AND ca.JtPos > 0

    THEN SUBSTRING(Data1,ca.JtPos+10,500)

    ELSE '' END))

    ,job_type = RTRIM(MAX(CASE WHEN Tag1 = 'job_type' THEN DATA1 ELSE '' END))

    ,days_of_week = RTRIM(MAX(CASE WHEN Tag1 = 'days_of_week' THEN DATA1 ELSE '' END))

    ,start_times = RTRIM(MAX(CASE WHEN Tag1 = 'start_times' THEN DATA1 ELSE '' END))

    ,run_calendar = RTRIM(MAX(CASE WHEN Tag1 = 'run_calendar' THEN DATA1 ELSE '' END))

    ,start_mins = RTRIM(MAX(CASE WHEN Tag1 = 'start_mins' THEN DATA1 ELSE '' END))

    ,machine = RTRIM(MAX(CASE WHEN Tag1 = 'machine' THEN DATA1 ELSE '' END))

    ,watch_file = RTRIM(MAX(CASE WHEN Tag1 = 'watch_file' THEN DATA1 ELSE '' END))

    ,box_name = RTRIM(MAX(CASE WHEN Tag1 = 'box_name' THEN DATA1 ELSE '' END))

    FROM cteLeftTag

    CROSS APPLY (VALUES(NULLIF(CHARINDEX('job_type',Data1),0)))ca(JtPos)

    GROUP BY DataGroup

    ;

     

  • I don't know what the data is that you're playing that code against.  Is strongly suspect that you may have imported the data incorrectly OR there's an anomaly in the data that you didn't expect.

    Since you said previously that you can't show actual data, I suggested that you post the code you used to import the data with and you've resisted that.  It would also be good to see the CREATE Table statement because there's a key element to what I did in my demo... the rows are numbered to preserve the order.  That's why I was talking about all this needing to be single threaded... it MUST be done in a serial fashion.

    Without the data that's failing and without the BULK INSERT code and without the DDL for the target of the BULK INSERT, all I can tell you is to go look at the actual data and find the anomaly.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • CREATE TABLE [dbo].[PSFileOrder](
    [PSFOrder_Id] [int] IDENTITY(0,1) NOT NULL,
    [PSFile] [nvarchar](500) NOT NULL,
    CONSTRAINT [PK_PSFileOrder] PRIMARY KEY CLUSTERED
    (
    [PSFOrder_Id] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
    ) ON [PRIMARY]
    GO
  • Table looks right but I don't know why you resist providing the information I asked for.  You didn't include the BULK INSERT I asked for. 🙁

    No matter though.  I made the mistake of thinking that you actually copied my code and used it.  You have not.  If we look at the first section of you code posted above, we see the following...

    SELECT df.PSFOrder_Id
    ,DataGroup = COUNT(CASE
    WHEN substring(df.PSFile,1,2) = '/*'
    THEN 1
    END) OVER (ORDER BY df.PSFOrder_Id)
    ,DataFromFile = df.PSFile

    Here's my original working code....

    SELECT  df.RowNum                                                                    
    ,DataGroup = COUNT(CASE
    WHEN LEFT(df.DataFromFile,2) = '/*'
    THEN 1
    END) OVER (ORDER BY df.RowNum ROWS UNBOUNDED PRECEDING)
    ,DataFromFile = CONVERT(VARCHAR(500),df.DataFromFile)

    With the obvious required change of column names being an understood exception, do you see any differences that just might make a wee bit of a difference?  And, no... I'm not going to spend any more time desk-checking the rest of your code.

    Sorry, Bruin, but you managed to piss off one the most patient people on the planet.. and he's all done trying to help here.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I apologize Jeff, I really appreciate the time you take to help solve issues.

    This is the code to load the file...

    Many thanks, and hope you don't go dark on this...

     

    Again thanks..

     

    #LoadPSFile
    cls
    Get-content "C:\FileLoading\jobs.txt" |
    Foreach-Object {$_ -replace "'", "''"} |
    ForEach-Object{invoke-sqlcmd -serverInstance localhost\sqlexpress -query "Insert dbastuff.dbo.PSFileOrder(PSFile) Values ('$_')"}
  • you really have no idea of what you are doing neither do you follow the advise of this forum experts. on this case you decided to change the supplied code by yourself and then complain it doesn't work.

    it really is a waste of our time to try and help you.

     

    first thing you need to do is to use BULK INSERT to load the file - not that horrible snippet you have above (DO NOT EVER DO IT THAT WAY!!!)

    first read the documentation of BCP here https://docs.microsoft.com/en-us/sql/tools/bcp-utility?view=sql-server-ver16

    you have one example you can use here https://stackoverflow.com/questions/6033275/bcp-to-bulk-insert-into-a-single-row-field

    I also advise you strongly to read the documentation of the sqlclient bulkcopy - (google for powershell sql bulkcopy) as this will be another option you can use with more options

    once you have the data loaded using BCP and after making sure that the file got loaded CORRECTLY and has each input line on a single row on the destination table and that the rows are on the SAME order as they are on the input file go and use the EXACT code that Jeff gave you and see if you still have issues.

  • Okay I have the file loaded using BCP. I created a format file and used BCP provided... all rows loaded

    All data is now loaded to the table I used above.  I used Jeff's code as he previously provided and just substituted the new table and field names.. to the top part of the SP.

    This is the error:

    Msg 537, Level 16, State 2, Line 1

    Invalid length parameter passed to the LEFT or SUBSTRING function.

    I did notice in example provided that job_type is there twice... I know that insert_job and job_type are contained on the same line so I can see the 2 selections\lookups, but was unsure of the extra job_type in the example.

    THanks again and I hope I have provided everything needed...

     

    SELECT  DataGroup
    ,insert_job = RTRIM(MAX(CASE WHEN Tag1 = 'insert_job'
    THEN SUBSTRING(Data1,1,ISNULL(ca.JtPos-1,500))
    ELSE '' END))
    ,job_type = RTRIM(MAX(CASE WHEN Tag1 = 'insert_job' AND ca.JtPos > 0
    THEN SUBSTRING(Data1,ca.JtPos+10,500)
    ELSE '' END))
    ,job_type = RTRIM(MAX(CASE WHEN Tag1 = 'job_type' THEN DATA1 ELSE '' END))

     

     

      WITH 
    cteGroup AS
    (--==== First, we have to group the lines from the file. (THIS IS THE KEY TO EVERYTHING!)
    SELECT df.PSFOrder_Id
    ,DataGroup = COUNT(CASE
    WHEN LEFT(df.PSFile,2) = '/*'
    THEN 1
    END) OVER (ORDER BY df.PSFOrder_Id ROWS UNBOUNDED PRECEDING)
    ,DataFromFile = CONVERT(VARCHAR(500),df.PSFile)
    FROM PSFileOrder df
    bcp copy2.dbo.psfileorder in c:\ftp_in\jobs.txt -f c:\ftp_in\fileorder.fmt -S local\SQLEXPRESS -E -T
  • based on the code by Jeff the only way you get that error is if your file contains a line that has data and does not have a ":" or if the lines with "/*" do not have this on the very first position.

     

    As you refuse to supply the real file there is nothing we can do - plenty you can and should do - including starting processing just part of the records until you identify the line that is causing the issue.

  • I stated in early post I couldn't supply file but fill check table as you gave a good starting place... What about job_type in there twice?

     

    Thx.

  • Bruin wrote:

    I stated in early post I couldn't supply file but fill check table as you gave a good starting place... What about job_type in there twice?

    Thx.

    Check the code.  Do you think having such duplication will work?

    Also, you may not be able to post the exact data but nothing is preventing you from knocking out some sanitized test data.  If you want help, you have to be willing to be helped.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 31 through 45 (of 55 total)

You must be logged in to reply to this topic. Login to reply