Read Length of first Row and Header Row Delimiter through SSIS Script Task

  • Hi Guys,

    I have two questions. I am automating the process through SSIS to load the .txt file into SQL Table.

    The file is in Same format .txt and in "Fixed Width" with no header.

    File 1 =
        a) Header Row Delimiter = LF
        b) Length of the data is 278 Characters

    File 2 =
        a) Header Row Delimiter = {CR}{LF}
        b) Length of the data is 295 Characters

    Questions:-
        1) How can I read the Header Row Delimiter through C# in "Script Task" and pass through expression?
        2) How can I get the length of the data row through c# in "Script Task"? i.e 278 or 295 through

    Any advice would be greatly appreciated.

    Thank You.

  • rocky_498 - Monday, October 15, 2018 6:32 PM

    Hi Guys,

    I have two questions. I am automating the process through SSIS to load the .txt file into SQL Table.

    The file is in Same format .txt and in "Fixed Width" with no header.

    File 1 =
        a) Header Row Delimiter = LF
        b) Length of the data is 278 Characters

    File 2 =
        a) Header Row Delimiter = {CR}{LF}
        b) Length of the data is 295 Characters

    Questions:-
        1) How can I read the Header Row Delimiter through C# in "Script Task" and pass through expression?
        2) How can I get the length of the data row through c# in "Script Task"? i.e 278 or 295 through

    Any advice would be greatly appreciated.

    Thank You.

    I guess a better question would be... why do you think you need C# to do anything with this task?

    If you could post the record layout for each file (absolutely need since the rows are fixed field), we can probably help quite a bit and without a diversion into C#.

    Also, in one breath you say the files have no headers but then in the individual descriptions of the files, you say there's a header.  Which is it?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff, Thank You for your time to reply. 
    I was thinking, If I could create two Data Flow
    1) DF with Row Delimiter "LF" with 275 character
    2) DF with Row Delimiter "CRLF" with 290 character

    Anyway, below Is my sample data from both files and sample SQL table. Just an FYI I created a sample data. 
    Please let me know if you have any questions. 

    Sample File 1

    000001California912347601234565
    000002California912357601234565
    000003California912367601234565

    Row Delimiter = LF

    Sample File 2

    000001California912347601234565XYZ
    000002California912357601234565ABC
    000003California912367601234565HYW

    Row Delimiter = {CR}{LF}

    Test SQL Table

    CREATE TABLE Test1
    (
        ID VARCHAR(6)
        ,State    VARCHAR(10)
        ,ZIP    VARCHAR(5)
        ,Phone    VARCHAR(10)
        ,Extra    VARCHAR(10)
    )

  • Yes, there are actually a couple of different ways to do this using a script task (assuming you mean T-SQL scripts).  My biggest question, though, is there nothing in the file names of the files that would give you a hint as to which type of file it is?  If not, I'll give it a whirl with code.

    Also, just to help me pick the best method for you, are you allowed to store BCP format files either in the same place as the data or on your server somewhere or anywhere that the server can read?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • To access BCP file will be a little bit challenge for me to convience IT.
    After I come up with the solution I have to focus on performance because from this process it will almost more than 300 files will process through daily.

    File Name is not consistent and can't rely on to use this logic.

  • rocky_498 - Monday, October 15, 2018 8:09 PM

    To access BCP file will be a little bit challenge for me to convience IT.
    After I come up with the solution I have to focus on performance because from this process it will almost more than 300 files will process through daily.

    File Name is not consistent and can't rely on to use this logic.

    Ah.  Ok.  Understood.  For performance, BCP format files and the use of BULK INSERT would be absolutely the fastest but we can do well enough.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • One more piece of info, please.... do this files have the exact same fields in exactly the same order but are simply different in field width?  And both are supposed to end up in the same table?

    Also, have you already got these things mapped out in SSIS?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • If the answer to the last question above is "Yes... I already have these all mapped out in SSIS", there no need to throw all that work away.  Just make a temporary table with a single wide column of, say, VARCHAR(300) and use the LAST ROW parameter in BULK INSERT to load only one row (the forst row) from the file.  The delimiter for the bulk insert should be "0x0A", which is the hexadecimal for the Lf character (\ n was supposed to be but they screwed that up a bit).  Then just read the length of that one row with the understanding that the row from the file with CrLfs in the rows will probably have a CHAR(13) for the last character in the row.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Yes, Exactly same fields and both files are going into the same table except one file has one extra and others don't. 
    Can you share your test .dtsx package?

  • Sorry, I miss understood your last question. Yes, I already have mapped those fields in SSIS.

  • What Expression value should I give for "LastRow" parameter?

  • Sorry... I don't have a "test dtsx package", Rocky.  I would have done all of this using BULK INSERT all from a stored procedure including reading the file names.  I wouldn't have gone anywhere near SSIS for this simple type of thing.

    You said that you could use a script task and I was hoping that was an SQL Script task.  If so, the BULK INSERT of just one row using the "0x0A" delimiter and the LAST_ROW option would have read the first row for you to measure.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ohh K. Can you share your stored proc if you already have written? 
    Maybe I can use the SQL Script task to move the data from .txt to staging SQL table and then go from there.

    Thank You for your help.

  • Nah.... do what I do... don't even bother SSIS for this.

    I created files for your two sample tables being sure that the first one's lines was terminated with LF only and the second with CrLF.  Then I ran this against the C:\Temp1 directory I had put them in.  Of course, the details of what was done are in the comments in the code.


    --=====================================================================================================================
    --      Create the necessary objects for this code.
    --      This only needs to be done once.
    --=====================================================================================================================
    --===== Careful!  We're dropping objects for this demo!!!
         -- That's why I commented it out before I posted the code.
         -- I needed these conditional drops to make my testing easier.
         -- IF OBJECT_ID('dbo.Staging'   ,'U') IS NOT NULL DROP TABLE dbo.Staging;
         -- IF OBJECT_ID('dbo.vStaging'  ,'V') IS NOT NULL DROP VIEW  dbo.vStaging;
         -- IF OBJECT_ID('dbo.DataParser','V') IS NOT NULL DROP VIEW  dbo.DataParser;
         -- IF OBJECT_ID('dbo.Final'     ,'U') IS NOT NULL DROP TABLE dbo.Final;
    GO
    --===== This is the target for the BULK INSERT.
         -- It'll be a permanent object that will be TRUNCATEd before each run.
     CREATE TABLE dbo.Staging
            (
             RawData  VARCHAR(500)
            ,FilePath VARcHAR(1000) -- Added this
            )
    ;
    GO
    --===== We'll need this named constraint later to save the file path for each file we loop through.
      ALTER TABLE dbo.Staging
        ADD CONSTRAINT DF_Final_FilePath DEFAULT ('') FOR FilePath
    ;
    GO
    --===== Create the insertable view that will do the clean BULK INSERT into the staging table
     CREATE VIEW dbo.vStaging AS
     SELECT RawData
       FROM dbo.Staging
    ;
    GO
    --===== Create the view that will do the parsing work.
         -- We're using a view to do the parsing because you said that you probably couldn't use BCP Format files.
         -- There's always more than one way to skin a cat. ;-)
         -- This is also a permanent object.
     CREATE VIEW dbo.DataParser AS
     SELECT  ID    = SUBSTRING(RawData, 1, 6)
            ,State = SUBSTRING(RawData, 7,10)
            ,ZIP   = SUBSTRING(RawData,17, 5)
            ,Phone = SUBSTRING(RawData,22,10)
            ,Extra = NULLIF(SUBSTRING(RawData,32,10),CHAR(13))
            ,FilePath
       FROM dbo.Staging
    ;
    GO
    --===== This is whatever your final table is.
         -- Up to you if it's supposed to be TRUNCATEd before each run.
     CREATE TABLE dbo.Final
            (
             ID       CHAR(6)
            ,State    CHAR(10)
            ,ZIP      CHAR(5)
            ,Phone    CHAR(10)
            ,Extra    VARCHAR(10)
            ,FilePath VARcHAR(1000) -- Added this
            )
    ;
    /*@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
      Here's the code that does the imports.
      You could turn this into a stored procedure.
    @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@*/
    --=====================================================================================================================
    --      PRESETS
    --=====================================================================================================================
    --===== Environmental presets
        SET NOCOUNT OFF; -- We want to see the line counts here.

    --===== Drop any temporary tables. 
         -- This may be commented out if you turn this into a stored procedure.
         -- Again, I just needed this to make testing easier.
         IF OBJECT_ID('tempdb..#DirInfo'    ,'U') IS NOT NULL DROP TABLE #DirInfo;
         IF OBJECT_ID('tempdb..#FileControl','U') IS NOT NULL DROP TABLE #FileControl;

    --====== Truncate the final target table.  Comment this out if you don't want to Truncate at the beginning of each run.
    TRUNCATE TABLE dbo.Final
    ;
    --===== Create the table that will capture the output of xp_DirTree.
     CREATE TABLE #DirInfo
            (
             ObjectName VARCHAR(500) PRIMARY KEY CLUSTERED
            ,Depth      INT
            ,IsFile     TINYINT
            )
    ;
    --===== Create the table that will hold all the file names that match the pattern.
     CREATE TABLE #FileControl
            (
             FileNum    INT IDENTITY(1,1) PRIMARY KEY CLUSTERED
            ,[FileName] VARCHAR(500)
            )
    ;
    --===== Create some obviously named variables and presets
    DECLARE @Dummy BIT
            ,@Counter       INT          = 1
            ,@FileCount     INT
            ,@FilePath      VARCHAR(500) = 'C:\Temp1\'
            ,@FileName      VARCHAR(500)
            ,@FilePattern   VARCHAR(50)  = '%.txt'
            ,@SQL           VARCHAR(600)
    ;
    --=====================================================================================================================
    --      Get the directory information, which may contain subdirectories and non .txt files
    --=====================================================================================================================
    --======= Let the operator know what we're doing.
        PRINT REPLICATE('-',120);
    RAISERROR ('Gathering directory information from "%s"...',0,0,@FilePath) WITH NOWAIT
    ;
    --===== Populate the directory information table with file names and other directory info.
     SELECT @SQL = 'xp_Dirtree ''' + @FilePath + ''',1,1' -- 1 Level deep and capture file names too!
    ;
     INSERT INTO #DirInfo
       EXEC (@SQL)
    ;
    -- SELECT * FROM #DirInfo; --Just for troubleshooting

    --=====================================================================================================================
    --      Extract just the files we want to work with and number them so we can "loop" through the names to import them.
    --      We'll also remember the number of files we loaded as an end point for the loop.
    --=====================================================================================================================
    --===== Let the operator know what we're doing.
        PRINT REPLICATE('-',120);
    RAISERROR ('Extracting files with a pattern of "%s"',0,0,@FilePattern) WITH NOWAIT
    ;
     INSERT INTO #FileControl
            ([FileName])
     SELECT ObjectName
       FROM #DirInfo
      WHERE ObjectName LIKE @FilePattern
      ORDER BY ObjectName
    ;
     SELECT @FileCount = @@ROWCOUNT
    ;
    --=====================================================================================================================
    --      Now we'll load each file into the "RawData" column of the staging table.
    --=====================================================================================================================
    --======= Let the operator know what we're doing.
        PRINT REPLICATE('=',120);
    RAISERROR ('Importing the files...',0,0,@FilePattern) WITH NOWAIT
    ;
    --====== Make sure the staging table is clear
    TRUNCATE TABLE dbo.Staging
    ;
    --===== Now, load each file that's in the File Control table.
      WHILE @Counter <= @FileCount
      BEGIN
            --===== Get the file name.
             SELECT @FileName = [FileName]
               FROM #FileControl
              WHERE FileNum = @Counter
            ;
            --===== Tell the operator what we're doing so they don't get bored.
              PRINT REPLICATE('-',120);
          RAISERROR ('Working on File # %u of %u : %s%s',0,0,@Counter,@FileCount,@FilePath,@FileName) WITH NOWAIT
            ;
            --===== Drop the default file path constraint from the table so we can recreate it with the current file path.
              ALTER TABLE dbo.Staging
               DROP CONSTRAINT DF_Final_FilePath;
            ;
            --===== Create the new default file path constraint so we know while file each row came from.
             SELECT @SQL = '
              ALTER TABLE dbo.Staging
                ADD CONSTRAINT DF_Final_FilePath DEFAULT (''' + @FilePath+@FileName + ''') FOR FilePath;'
            ;
               EXEC (@SQL)
            ;
            --===== Now, load the file contents.
                 -- Note that the rows in both file types end with an LF or "Newline" character and because we're
                 -- not doing anydata parsing yet, this works for both kinds of files without any other consideration.
             SELECT @SQL = '
               BULK INSERT dbo.vStaging
               FROM ''' + @FilePath+@FileName + '''
               WITH (
                     BATCHSIZE       = 3000000
                    ,CODEPAGE        = ''RAW''
                    ,DATAFILETYPE    = ''char''
                    ,FIELDTERMINATOR = ''''
                    ,ROWTERMINATOR   = ''0x0A''
                    ,TABLOCK
                    )
            ;';
               EXEC (@SQL)
            ;
            --===== Bump the control counter
             SELECT @Counter += 1
            ;
        END
    ;
    --======= Let the operator know what we're doing.
    RAISERROR ('File loads complete.',0,0,@FilePattern) WITH NOWAIT;
        PRINT REPLICATE('=',120);

    --=====================================================================================================================
    --      Use the view to parse the raw data into the final table.
    --=====================================================================================================================
    --======= Let the operator know what we're doing.
    RAISERROR ('Parsing the data.',0,0,@FilePattern) WITH NOWAIT
    ;
     INSERT INTO dbo.Final
            (ID,State,ZIP,Phone,Extra,FilePath)
     SELECT ID,State,ZIP,Phone,Extra,FilePath
       FROM dbo.DataParser
    ;
    --======= Let the operator know what we're doing.
    RAISERROR ('RUN COMPLETE. %u files loaded into the dbo.Final table.',0,0,@FileCount) WITH NOWAIT;
        PRINT REPLICATE('=',120)
    ;
    --===== Let's see what we did...
     SELECT * FROM dbo.Final
    ;

    The output from that looks like this and the script could be turned into a proc and called from SSIS.

    The output in the messages tab will keep someone from being bored.  Here's what that looks like.


    ------------------------------------------------------------------------------------------------------------------------
    Gathering directory information from "C:\Temp1\"...

    (2 row(s) affected)
    ------------------------------------------------------------------------------------------------------------------------
    Extracting files with a pattern of "%.txt"

    (2 row(s) affected)
    ========================================================================================================================
    Importing the files...
    ------------------------------------------------------------------------------------------------------------------------
    Working on File # 1 of 2 : C:\Temp1\Sample1.txt

    (3 row(s) affected)
    ------------------------------------------------------------------------------------------------------------------------
    Working on File # 2 of 2 : C:\Temp1\Sample2.txt

    (3 row(s) affected)
    File loads complete.
    ========================================================================================================================
    Parsing the data.

    (6 row(s) affected)
    RUN COMPLETE. 2 files loaded into the dbo.Final table.
    ========================================================================================================================

    (6 row(s) affected)

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff, 

    I don't have words to say THANK YOU for your help. Very professional way to help. 

    I appreciate it. I will apply this logic and run through and let you know if I have any questions.

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply