Flat File Source (CSV) rows discrepancy

  • Normally, googling and messing around, solves 99.999% of my IT (or relationship) problems. This one however, is giving me massive headaches.

    Situation:

    I have a package which runs over a selection of files and checks which files to be processed by the dataflow task. The dataflow task is in a foreach loop, looping through all selected files. In order to load the file, the flat file source makes use of a variable connectionstring.

    Issue:

    I am not receiving all rows as logged in the CSV file.

    When using the flat file source, and the expression for setting the connectionstring, I get 28 rows of the in total 1.110.

    When using the flat file source, and setting a static connectionstring (same file as above), I get 1.110 rows of the in total 1.110.

    Basically, using the expression based connectionstring, I only get 2% of the content of this CSV (last 28 rows of the file). All else being the same in the package.

    Done so far

    Here is a summary what I have done so far:

    - Set DelayValidation on the flat file connection to TRUE and also tried FALSE (no change in behavior)

    - Checked the file in Notepad++ for any line ending errors, these are all the same and as expected (otherwise the static version would not work)

    - I set the outputcolumnwidth to 5000 so that all rows should be able to process

    - I set the Error Output rows to capture any rows being dropped, no rows are dropped

    - I checked for any errors in the execution results, it simply says "The total number of data rows process for file x is 29)

    - Googled, worldwide

    - Bing(ed), worldwide

    Am I overlooking something?

  • berryber (11/5/2016)


    Normally, googling and messing around, solves 99.999% of my IT (or relationship) problems. This one however, is giving me massive headaches.

    Situation:

    I have a package which runs over a selection of files and checks which files to be processed by the dataflow task. The dataflow task is in a foreach loop, looping through all selected files. In order to load the file, the flat file source makes use of a variable connectionstring.

    Issue:

    I am not receiving all rows as logged in the CSV file.

    When using the flat file source, and the expression for setting the connectionstring, I get 28 rows of the in total 1.110.

    When using the flat file source, and setting a static connectionstring (same file as above), I get 1.110 rows of the in total 1.110.

    Basically, using the expression based connectionstring, I only get 2% of the content of this CSV (last 28 rows of the file). All else being the same in the package.

    Done so far

    Here is a summary what I have done so far:

    - Set DelayValidation on the flat file connection to TRUE and also tried FALSE (no change in behavior)

    - Checked the file in Notepad++ for any line ending errors, these are all the same and as expected (otherwise the static version would not work)

    - I set the outputcolumnwidth to 5000 so that all rows should be able to process

    - I set the Error Output rows to capture any rows being dropped, no rows are dropped

    - I checked for any errors in the execution results, it simply says "The total number of data rows process for file x is 29)

    - Googled, worldwide

    - Bing(ed), worldwide

    Am I overlooking something?

    This does sound rather unlikely.

    Have you considered trying this?

    - Put only the problematic file in the 'to be processed' folder.

    - Add a data viewer in your data flow task (start near the 'source' and then work down, until you find where the rows disappear)

    and then see whether that offers any insights.


  • berryber (11/5/2016)


    Normally, googling and messing around, solves 99.999% of my IT (or relationship) problems. This one however, is giving me massive headaches.

    Situation:

    I have a package which runs over a selection of files and checks which files to be processed by the dataflow task. The dataflow task is in a foreach loop, looping through all selected files. In order to load the file, the flat file source makes use of a variable connectionstring.

    Issue:

    I am not receiving all rows as logged in the CSV file.

    When using the flat file source, and the expression for setting the connectionstring, I get 28 rows of the in total 1.110.

    When using the flat file source, and setting a static connectionstring (same file as above), I get 1.110 rows of the in total 1.110.

    Basically, using the expression based connectionstring, I only get 2% of the content of this CSV (last 28 rows of the file). All else being the same in the package.

    Done so far

    Here is a summary what I have done so far:

    - Set DelayValidation on the flat file connection to TRUE and also tried FALSE (no change in behavior)

    - Checked the file in Notepad++ for any line ending errors, these are all the same and as expected (otherwise the static version would not work)

    - I set the outputcolumnwidth to 5000 so that all rows should be able to process

    - I set the Error Output rows to capture any rows being dropped, no rows are dropped

    - I checked for any errors in the execution results, it simply says "The total number of data rows process for file x is 29)

    - Googled, worldwide

    - Bing(ed), worldwide

    Am I overlooking something?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • berryber (11/5/2016)


    Normally, googling and messing around, solves 99.999% of my IT (or relationship) problems. This one however, is giving me massive headaches.

    Situation:

    I have a package which runs over a selection of files and checks which files to be processed by the dataflow task. The dataflow task is in a foreach loop, looping through all selected files. In order to load the file, the flat file source makes use of a variable connectionstring.

    Issue:

    I am not receiving all rows as logged in the CSV file.

    When using the flat file source, and the expression for setting the connectionstring, I get 28 rows of the in total 1.110.

    When using the flat file source, and setting a static connectionstring (same file as above), I get 1.110 rows of the in total 1.110.

    Basically, using the expression based connectionstring, I only get 2% of the content of this CSV (last 28 rows of the file). All else being the same in the package.

    Done so far

    Here is a summary what I have done so far:

    - Set DelayValidation on the flat file connection to TRUE and also tried FALSE (no change in behavior)

    - Checked the file in Notepad++ for any line ending errors, these are all the same and as expected (otherwise the static version would not work)

    - I set the outputcolumnwidth to 5000 so that all rows should be able to process

    - I set the Error Output rows to capture any rows being dropped, no rows are dropped

    - I checked for any errors in the execution results, it simply says "The total number of data rows process for file x is 29)

    - Googled, worldwide

    - Bing(ed), worldwide

    Am I overlooking something?

    Have you run the package in debug mode and checked the actual connection string value?

    😎

  • Found it,

    10 points to Eirikur. (indeed input was messed up, filenames were the same, path was slightly different).

    @Phil Parkin, thx!

    Sorry for stealing about 12 minutes of your time!

  • You are very welcome.

    😎

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply