Read each line of file and do something

  • Looking for general advice. I've done some digging, but all I found was a lot of script tasks on posts from 2009 and before.

    I have a file I need to parse through and test certain values. If they're okay, go onto next test, else, drop out and fail.

    I know I want to use a ForEach loop to iterate through, but how do I hook up the file as the source of the ForEach loop? I'm assuming once I hook that up, I just set each value to a variable and use those variables in my loop to test.

    It's just hooking up the file to the foreach loop that is stumping me and I have a bad feeling it's something obvious.

    Thanks!

  • I think I might be able to answer my own question.

    1) I have a data flow task that takes all the file data and puts it into a recordset dest. I assign that a variable and use a Foreach ADO enumerator with that variable as the ADO object source variable and map each field in the file to it's own variable.

    2) I check specific variables in the foreach loop against tables in my db to see if they're viable. If not, I jump to the next row.

    3) Need to figure out how to collect the failures and report them.

  • The For Each loop is used for things like looping through a set of files in a folder, or a list of records from a table usually for some sort of dynamic loading or exporting. You process a data file via the Data Flow Task.

    Set up a Connection Manager to the data file, then add a Data Flow task to the Control Flow designer. Double click on the data flow task and then add a Data source pointing to your connection manager. That will read in the file, then you add tasks within the Data Flow task to process the records in the file. You can use a Look Up task if you need to take a value(s) from the source file and compare it to a value in a table to bring in another value from the table and you can direct matching and non match output to different destinations (like an error table). You can also add a Script Component and do some logic in VB .NET/C# as each record is read in. Or you can do more simple Derived Column transformations using SSIS expressions, like doing Null handling or data cleansing. You can use an OLE DB Command to send column values in each row to a stored procedure. Or you can just do a basic load and map the incoming columns to a data flow destination (which will also require a connection manager). There are many other data flow tasks, I'd recommend reading up on the SSIS stairway series here or poking around BOL for descriptions of how you might use those tasks.

    MWise

  • So I don't need a foreach loop to read all the rows in the file? It'll do that inside the data flow once I connect to the specific file?

  • Instead of a loop, I'd say it's more like a pipeline with the records flowing through each component in the data flow task. Check out the 2nd article in the SSIS stairways series [/url].

    MWise

  • Matthew Cushing (11/1/2012)


    Looking for general advice. I've done some digging, but all I found was a lot of script tasks on posts from 2009 and before.

    I have a file I need to parse through and test certain values. If they're okay, go onto next test, else, drop out and fail.

    I know I want to use a ForEach loop to iterate through, but how do I hook up the file as the source of the ForEach loop? I'm assuming once I hook that up, I just set each value to a variable and use those variables in my loop to test.

    It's just hooking up the file to the foreach loop that is stumping me and I have a bad feeling it's something obvious.

    Thanks!

    Gosh... why not just load the whole file up and test the value using setbased techniques and mark which rows are good or bad according to your tests?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Gosh? Just looking for some general advice on how to get back into ssis after being away for a while. Thanks for the support.

  • Matthew Cushing (11/2/2012)


    Gosh? Just looking for some general advice on how to get back into ssis after being away for a while. Thanks for the support.

    Not sure what was wrong with the word "gosh" here but it really was meant as a helpful suggestion. Instead of thinking about what you want to do with a row, think about what you want to do with a column (or columns). I'm not sure what kind of checks you intend to make but usually such checks can be done en masse by updating a "status" or "keeper" column with an UPDATE (or two) with a couple of CASE statements based on criteria that does the checks you're looking for.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Matt,

    What kind of value tests are you looking to do against the rows? As mentioned above, you can do things with lookup tests and the like to test for foreign key violations and the like during the datastream, also you can use derived columns and conditional splits (sometimes in tandem, if necessary) to usually do most of your tests to split the stream into success/review/fail streams as needed.

    Additionally, there's always the transformation script component for more complex checks during the datastream, allowing you to directly choose which output stream to deliver an inbound stream to. You'll need asynchronous for that one, but it's not much harder than a synchronous transformation.

    The things you'll want to avoid during this process however is anything that requires the stream to 'pause', as that'll kill any optimization you might get out of using the datastream instead of dumping to staging table and running bulk T-SQL against it. Those include sorts and aggregates, as those are the usual culprits for something like this. Fuzzy Grouping is another one. Basically anything that needs to grab a group of rows before it can do something. Avoid those like the plague, they're usually best off performed in T-SQL unless you're multicasting for final counts or similar.

    On a side note, Jeff wasn't being sarcastic or trying to imply anything other than what he directly said. He's incredibly efficient in T-SQL and try as I might I can't convince him of the value of SSIS in datastream optimizations... mostly because he can usually counter with an equivalent speed method in T-SQL. :w00t: I do occassionally get him though, but it comes down to 'It Depends' scenarios.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Sorry, my bad. In a bad place, two friends lost everything to Sandy and I took it the wrong way. My apologies.

  • Matthew Cushing (11/2/2012)


    Sorry, my bad. In a bad place, two friends lost everything to Sandy and I took it the wrong way. My apologies.

    Thanks for that Matthew. I definitely understand that. My heart goes out to all those folks and to folks like you that have friends and relatives in that area.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply