SSIS performance issue

  • Hi

    I have changed an ETL process which runs on SSMS to run directly from the file system through SSIS, to I run SQL Execute task which them gets the file with the SSIS script. It is taking 4+ times longer to run the same query through SSIS than directly in SSMS. There are a vast number of scripts so it difficult narrowing down the cause of this. Any hints what I can do and what to look for to improve performance? We are using quite a lot of cursors in the queries to do updates on tables, some which have more than a billion records.

    Andre

  • Andre

    First, do you really need those cursors? If you can find a way to do your update with a set-based query, it's likely to run much faster. Second, are the settings the same for you when you run it from SSMS as they are for whatever account runs it through SSIS? Options such as ANSI NULLS and regional settings can make a difference. Finally, is the network latency the same between the source server and where SSMS runs as between the source server and whatever server consumes the output?

    John

  • John, thanks for the reply

    First of all the cursor seems to be the only way, we are updating/deleting records where rowcounts are in the 100 of millions or early billions, we had issues when running straight updates, so we put them into a cursor to update a millions rows at a time.

    How do I check this for SSIS and SSMS?

    Second, are the settings the same for you when you run it from SSMS as they are for whatever account runs it through SSIS? Options such as ANSI NULLS and regional settings can make a difference.

    To third qustion, it's on the same server, we have set max memory on SQL so SSIS has about 50Gb memory to use.

    Anything else?

  • For SSMS, go to Tools -> Options -> Query Execution -> SQL Server -> Advanced. For SSIS, it depends on the properties of the network library you're using to connect to SQL Server. I think you can also capture the options in Profiler when the connection is made. Make sure you don't leave the trace running if it's a live server.

    You might also want to query your plan cache to see whether you have two different execution plans.

    John

  • Not seeing it, am using Visual Studio 201 BI development, am getting to options but not seeing the rest there

  • You said you were using SSMS.

    John

  • It runs faster on SSMS than on SSIS, I need to get the performance of SSIS up, and I am not sure why the scripts runs about 4 times faster through SSMS than on SSIS, sorry if I was not clear on this

  • That's what I thought you meant, and it's why I suggested you compare your options on SSMS to those it uses when run through SSIS.

    John

  • Maybe I am being a bit slow, but explain to me exactly how to compare the two.

    I would also like to find out why everyone is saying do not use cursors in SSIS, I have not really found a reason as such why to avoid it.

  • For SSMS, go to Tools -> Options -> Query Execution -> SQL Server -> Advanced. For SSIS, it depends on the properties of the network library you're using to connect to SQL Server. I think you can also capture the options in Profiler when the connection is made. Make sure you don't leave the trace running if it's a live server.

    Cursors aren't bad in SSIS in particular; they're just bad for performance generally (in most cases). There are cases where there's no alternative but to use them, and even a few situations in which they'll outperform set-based code. Your desire to avoid locking a whole chunk of data and filling up the transaction log is a good reason to use some sort of loop. SSIS can automatically process data in batches, though, so you may wish to take advantage of that instead of explicitly declaring cursors in your code. I don't know enough about what you're trying to do to recommend one way or the other.

    John

  • Thanks I will have a look at what you said, check the settings etc.

    The cursors we use look for a table table and field name, and for example if a certain table was two data fields that are both nulls we delete the entry(line). We are doing it in batches of checking through a million rows at a a time. The code is working perfectly though in SQL Mngmnt Studio, when we use the same code, pull it in through from a file location and run it as a SQL Query, it takes 4 time longer to run, that is our main concern, there are probably better ways to do it but as we are working with about 500 tables in total, we found it better to keep it generic code as far as possible, rather than writing separate code for each table. I will so what the settings are in SSMS and SSIS, but I do not think that is the issue, the normal queries, that also run from a gile location, example that set up primary and foreign keys, create indexes etc are all running within what we are expecting.

    At this stage it looks like getting away from the cursors seems to be the way to go, but that is a lot of extra work to put in, seeing as I am also trying to get the ETL more acceptable through SSIS than running code manually as it has been done before. We would ideally alos like to keep the code the same as we might have clients where the ETL's rather need to be done through SSMS because of server capacity.

  • Andre 425568 (8/12/2014)


    I will so what the settings are in SSMS and SSIS, but I do not think that is the issue, the normal queries, that also run from a gile location, example that set up primary and foreign keys, create indexes etc are all running within what we are expecting.

    Yes, but DDL operations don't (usually) return result sets and so network latency is much less likely to be a factor. Also, DML is more likely to be sensitive to options and regional settings. Would you be able to post a snippet of your code (or all of it), please?

    John

  • This is the cursor specifically where I picked up the problem

    DECLARE @object_id INT

    DECLARE @table_name VARCHAR(128)

    DECLARE @col_name VARCHAR(128)

    DECLARE @exec_sql_string VARCHAR(500)

    DECLARE @exec_sql NVARCHAR(500)

    DECLARE @max_id NVARCHAR(4000)

    DECLARE @OutputParameter NVARCHAR(4000)

    DECLARE @id_count NUMERIC(38, 0)

    DECLARE @id_count_to NUMERIC(38, 0)

    SET NOCOUNT ON

    DECLARE recs CURSOR DYNAMIC

    FOR

    SELECT object_id

    ,name

    FROM sys.tables ts

    WHERE object_id IN (

    SELECT object_id

    FROM sys.all_columns

    WHERE name = 'INDEX0'

    )

    ORDER BY name

    OPEN recs

    FETCH

    FROM recs

    INTO @object_id

    ,@table_name

    WHILE @@FETCH_STATUS = 0

    BEGIN

    SET @exec_sql_string = ''

    DECLARE recs1 CURSOR DYNAMIC

    FOR

    SELECT name

    FROM sys.all_columns

    WHERE object_id = @object_id

    AND name != 'ID'

    AND name != 'INDEX0'

    AND (

    name NOT LIKE 'MAS%'

    AND name NOT LIKE 'Z%'

    )

    OPEN recs1

    FETCH

    FROM recs1

    INTO @col_name

    WHILE @@FETCH_STATUS = 0

    BEGIN

    IF @exec_sql_string = ''

    BEGIN

    SET @exec_sql_string = 'DELETE FROM ' + @table_name + ' WHERE ' + @col_name + ' IS NULL '

    END

    ELSE

    BEGIN

    SET @exec_sql_string = @exec_sql_string + 'AND ' + @col_name + ' IS NULL '

    END

    FETCH

    FROM recs1

    INTO @col_name

    END

    SET @exec_sql = 'SELECT @OutputParameter = MAX(ID) FROM ' + @table_name

    EXEC @max_id = sp_executesql @exec_sql

    ,N'@OutputParameter nvarchar(4000) output'

    ,@OutputParameter OUTPUT

    IF LTRIM(RTRIM(@OutputParameter)) = ''

    OR @OutputParameter IS NULL

    BEGIN

    SET @OutputParameter = '0'

    END

    SET @id_count = 1

    IF @OutputParameter <> '0'

    BEGIN

    WHILE @id_count < @OutputParameter

    BEGIN

    SET @id_count_to = @id_count + 1000000

    EXEC ('BEGIN TRANSACTION ' + @exec_sql_string + ' AND ID >= ' + @id_count + ' AND ID <= ' + @id_count_to + ' COMMIT TRANSACTION')

    PRINT('FROM: ' + CAST(@id_count as varchar(150)) + ' TO: ' + CAST(@id_count as varchar(150)))

    SET @id_count = @id_count + 1000000

    END

    SET @id_count_to = @id_count + 1000000

    EXEC ('BEGIN TRANSACTION ' + @exec_sql_string + ' AND ID >= ' + @id_count + ' AND ID <= ' + @id_count_to + ' COMMIT TRANSACTION')

    PRINT('FROM: ' + CAST(@id_count as varchar(150)) + ' TO: ' + CAST(@id_count as varchar(150)))

    END

    CLOSE recs1

    DEALLOCATE recs1

    FETCH

    FROM recs

    INTO @object_id

    ,@table_name

    END

    CLOSE recs

    DEALLOCATE recs

    GO

  • I can't see anything there that would obviously run differently under different settings. What I advise you to do is to capture each event with a trace, and capture the execution plans as well. You can then compare the slow one to the faster one. Are you confident that they are doing exactly the same thing and it's just the speed that's different?

    One thing I did notice - I think your DELETEs would run quicker if you use the TOP clause instead of searching for ranges in your data.

    John

  • I will try that with a trace, we have looked at using top, but it has not been very effective. The only difference between the two is one gets copied and runs in a query window in ssms and the one in SSIS uses the file with the code from Execute SQL Task, then in General -SQL Statement we use file connection and the option just above that we link to the file with the exact same code run through SSMS Query.

    I will test profiler in a few days only as we are in the middle of a TEST ETL for one Client, and just after that another Client is going live, for now I will use scripts directly where we have cursors and for the rest of the code, I will use SSIS.

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply