Which would be faster, an SSIS Foreach or SQL Cursor for 6300 rows?

  • I'm currently re-writing an inherited C#/SQL CSV import process due to speed issues with the previous process (reading each line with a C# StreamReader).

    The CSV has 6300 rows, and 21 columns (mostly strings, a couple of dates), which are used to update figures in our internal database.

    I'm currently unsure of which of the below ideas would be more efficient:

    - A Foreach loop on all 21 columns, passing the 21 parameters to an Execute SQL Task and using to a Stored Procedure to update each row

    - Modifying my Data Flow Control to dump the imported result set to an SQL table, and using an SQL Cursor to iterate through each row

    I expect a cursor would be much more efficient than starting 6300 SSIS seperate Execute SQL tasks and passing all 21 parameters to it for each iteration of the foreach loop, but am unsure of this. Does anyone have any insight or guidance over which approach is more efficient? Any help would be greatly appreciated.

  • It's hard to tell at this point. Is there any way you can provide a bit more detail around the nature of the "update" you're doing?

    If you're looking to update a single table with the values you're getting in, neither version of looping will be particularly efficient. You're be better off creating something set-based that performs the update. Assuming that is in the right neighborhood, look at bulk-loading the data into a working table, then running a SQL update. Unless you're doing something very strenuous, I'd expect an update affecting 6300 records to be *very* fast indeed.

    That said - I'm guessing, because I don't know what is entailed in the update.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • sqlservercentral.com 32358 (5/4/2014)


    I'm currently re-writing an inherited C#/SQL CSV import process due to speed issues with the previous process (reading each line with a C# StreamReader).

    The CSV has 6300 rows, and 21 columns (mostly strings, a couple of dates), which are used to update figures in our internal database.

    I'm currently unsure of which of the below ideas would be more efficient:

    - A Foreach loop on all 21 columns, passing the 21 parameters to an Execute SQL Task and using to a Stored Procedure to update each row

    - Modifying my Data Flow Control to dump the imported result set to an SQL table, and using an SQL Cursor to iterate through each row

    I expect a cursor would be much more efficient than starting 6300 SSIS seperate Execute SQL tasks and passing all 21 parameters to it for each iteration of the foreach loop, but am unsure of this. Does anyone have any insight or guidance over which approach is more efficient? Any help would be greatly appreciated.

    I'd have to say that either method is going to be relatively slow. Without know exactly how the data you import from the CSV will be used, I'll have to generalize a bit.

    First and possibly not related to anything except for my general dislike (ok, it's a hatred :-)) for DTS, SSIS, Busiess Objects, Web Methods, and a shedload of other similar applications, I probably wouldn't go near SSIS with this task. There... like the little girl in the 6th sense said, "I feel much better now". 😛

    If it were me, I'd Bulk Load (using either Bulk Insert or BCP depending on where the file was stored) the data into a staging table and evaluate/validate the data in each COLUMN (possibly, all columns at once). I wouldn't loop through each row.

    After that, I'd write either a good ol' fashioned "upsert" (2005 and below) or use MERGE (2008 and above) to use the new data to update existing rows and to insert rows that don't already exist. At worst, it would be and Update followed by an Insert... not a process that did one row at a time... even if I were forced to do it in SSIS :-D.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Quick question, why do you think you need a Cursor?

    😎

  • sqlservercentral.com 32358 (5/4/2014)


    I'm currently re-writing an inherited C#/SQL CSV import process due to speed issues with the previous process (reading each line with a C# StreamReader).

    The CSV has 6300 rows, and 21 columns (mostly strings, a couple of dates), which are used to update figures in our internal database.

    I'm currently unsure of which of the below ideas would be more efficient:

    - A Foreach loop on all 21 columns, passing the 21 parameters to an Execute SQL Task and using to a Stored Procedure to update each row

    - Modifying my Data Flow Control to dump the imported result set to an SQL table, and using an SQL Cursor to iterate through each row

    I expect a cursor would be much more efficient than starting 6300 SSIS seperate Execute SQL tasks and passing all 21 parameters to it for each iteration of the foreach loop, but am unsure of this. Does anyone have any insight or guidance over which approach is more efficient? Any help would be greatly appreciated.

    If you consider SSIS as an option, you are using an incorrect approach. Instead of using a for each loop and an execute SQL task, read the flat file with a data flow and use an OLE DB command to do the update. However, it still processes the data row by row and it fires off 6300 different update statements.

    I would just dump the data into a SQL Server table and write a decent query on top of it.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Apologies for not responding to the answers here faster. You've all convinced me that the approach I'm going for with row based programming isn't the best practice. I had chosen this over set based because there's a set of processes that need followed depending on the values in each row. I've begun work re-writing this C#/SQL mix as a flow-chart, and will then try to re-write this using set based logic and BULK INSERT for the insert process.

    Because this is so far from SSIS and the original post's topic, I'll start a new question in the correct forum if I can't work out how to complete the reimplementation under SQL. Thanks heaps for the concern of my overall approach, I'm trying to move from simply getting things done to using best practices, and this challenge should help with that.

  • sqlservercentral.com 32358 (5/11/2014)


    I had chosen this over set based because there's a set of processes that need followed depending on the values in each row.

    Stop thinking like that. Start thing about the set of processes that each column needs, instead. Like the tagline in my signature line below says...

    [font="Arial Black"]"Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column." [/font]

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply