SSIS - Data Flow Task With Delete Very Slow

  • I have a Data Flow Task that reads from a Flat File Source (orders.csv) and then does an OLE DB Command to do a delete from a SQL database table if a match is found. This is the SQL command in the OLE DB Command Task:

    Delete orders from orders with(index(IX_orderno))

    WHERE orderno = ? and ordersuf = ? and oetype in ('r','t','d') and cono = ?

    (The index IX_orderno is defined as non-clustered and is orderno + ordersuf + oetype + cono. )

    There are 400,000 records in the orders.csv and the delete is extremely slow (over 1 hour.)

    We do have 8 indexes on this table, all non-clustered. Would this have any ramifications on the deletion speed?

    Help would greatly appreciated.

  • I tried with removing all the indexes from the table, except for the one being used, and then reran the task. The results were the same - no speed improvement.

  • mberman (3/15/2013)


    I have a Data Flow Task that reads from a Flat File Source (orders.csv) and then does an OLE DB Command to do a delete from a SQL database table if a match is found. This is the SQL command in the OLE DB Command Task:

    Delete orders from orders with(index(IX_orderno))

    WHERE orderno = ? and ordersuf = ? and oetype in ('r','t','d') and cono = ?

    (The index IX_orderno is defined as non-clustered and is orderno + ordersuf + oetype + cono. )

    There are 400,000 records in the orders.csv and the delete is extremely slow (over 1 hour.)

    We do have 8 indexes on this table, all non-clustered. Would this have any ramifications on the deletion speed?

    Help would greatly appreciated.

    The OLE DB Command Task will execute a Sql Command for every row in an ssis dataset.

    This is typically not what you would want to do for wholesale deletes as what this means is that your Sql Command is running, possibly thousands of times deleting one row each time.

    You can check this by spinning up Profiler and you will most likely see thousands of Delete actions.

    This is probably why the Delete is slow.

    You could try importing the data from the flat file into a scratch/temp. table in the database and then perform a batch delete via JOINS/MERGE or some such.

  • Just to clarify I meant the OLE DB Sql Command is run for every qualifying row not every row

    I would add that the times I have seen the OLE DB Command used is normally to simulate some kind of MERGE. My experience has been it is faster to do that kind of thing in Pure Sql, if you are able to do so.

  • Have you tried removing the index hint, just to guage any difference in performance?

    What I would do, as suggested, is move all csv rows into a staging table, perform DELETE FROM ... JOIN function to remove matching rows. SQL is much better working on sets of data.

    'Only he who wanders finds new paths'

  • OTF (3/15/2013)


    The OLE DB Command Task will execute a Sql Command for every row in an ssis dataset.

    ...

    This is probably why the Delete is slow.

    Not probably. It's with great certainty you can say the OLE DB command is the reason for the slow performance.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Thank you for your replies. The information was right on.

    I created a scratch table in the database (orders_del) that is just being used to hold the order#'s to be deleted. (I do a truncate on the table first thing.)

    Then the Execute SQL Command Taks does the delete:

    delete FROM dbo.orders WHERE exists

    (SELECT * FROM dbo.orders_del AS tmp

    WHERE tmp.cono = dbo.orders.cono

    AND tmp.orderno = dbo.orders.orderno

    AND tmp.ordersuf = dbo.orders.ordersuf)

    This reduced the time from 3 hours to 11 minutes!

  • mberman (3/19/2013)


    Thank you for your replies. The information was right on.

    I created a scratch table in the database (orders_del) that is just being used to hold the order#'s to be deleted. (I do a truncate on the table first thing.)

    Then the Execute SQL Command Taks does the delete:

    delete FROM dbo.orders WHERE exists

    (SELECT * FROM dbo.orders_del AS tmp

    WHERE tmp.cono = dbo.orders.cono

    AND tmp.orderno = dbo.orders.orderno

    AND tmp.ordersuf = dbo.orders.ordersuf)

    This reduced the time from 3 hours to 11 minutes!

    I guess Koen was right... a dead certainty that it was the issue 🙂

    Glad you are getting better performance out of it now.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply