Performance issue using UDF in bulk load jobs (SSIS)

  • Hi All,

    We have bulk load jobs configured using SSIS packages. All the load jobs were working perfectly fine until 2 days back. Suddenly, our load jobs are taking way longer than the normal time. The issue is identified to be the use of UDF as shown below,

    Note: These jobs were working fine since 3 years without any issues. We load around 16 GB of data each day.

    No changes to the server and\or SQL Server itself.

    Any help would be greatly appreciated.

    Thanks,

    -Mohsin

    ALTER FUNCTION [dbo].[convertdate_null] (@input varchar(50))

    RETURNS VARCHAR(12)

    AS

    BEGIN

    DECLARE @output DATETIME;

    SELECT @output = CASE WHEN (ISDATE ( ( CASE WHEN Left(replace(ltrim(rtrim(@input)),'+',''), 1) = 1

    THEN '20'

    ELSE '19'

    END + Right(ltrim(rtrim(@input)), 6))

    ) = 1)

    THEN CONVERT(VARCHAR(12),CONVERT (DATETIME, CASE WHEN Left(replace(ltrim(rtrim(@input)),'+',''), 1) = 1

    THEN '20'

    ELSE '19'

    END + Right(ltrim(rtrim(@input)), 6) ) )

    ELSE

    null

    END

    RETURN @output

    END

  • Mohammed Mohsin-392707 (11/11/2012)


    Hi All,

    We have bulk load jobs configured using SSIS packages. All the load jobs were working perfectly fine until 2 days back. Suddenly, our load jobs are taking way longer than the normal time. The issue is identified to be the use of UDF as shown below,

    Note: These jobs were working fine since 3 years without any issues. We load around 16 GB of data each day.

    No changes to the server and\or SQL Server itself.

    The function is a scalar UDF which is a performance problem in and of itself. That, notwithstanding, things don't just suddenly start running slower unless something actually did change. Perhaps the target table of your load finally reached a tipping point where the existing statistics are no longer valid causing a change for the worse in the execution plan. Perhaps the server is using more memory for other things than it used to.

    I recommend that you first rework the scalar UDF and the code or process that uses it to be an Inline Table Valued function that returns a scalar value. That will take care of some of the performance problem. Once that is done, I'd re-evaluate any slownness that remains.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • How is the UDF used? In a join or where clause predicate? Select list?

  • In a select list.

    Thanks,

    -Mohsin

  • Since SQL Server is generally quite poor a string manipulation, the obvious thing to do would be to move this logic into the layer that is most approriate ie the SSIS task.

    Will be a relatively trivial thing to do.



    Clear Sky SQL
    My Blog[/url]

  • I am using this SQL within the data flow task of SSIS package. I am doing simple select from a stage table and inserting into the base table. Wherever date coumns are there in stage, I am using UDF to validate and return.

    Any quick things I can take a look? Changing any thing related to UDF is a huge task as it has been used in more than 500 locations.

    Thanks,

    -Mohsin

  • Mohammed Mohsin-392707 (11/12/2012)


    I am using this SQL within the data flow task of SSIS package. I am doing simple select from a stage table and inserting into the base table. Wherever date coumns are there in stage, I am using UDF to validate and return.

    Any quick things I can take a look? Changing any thing related to UDF is a huge task as it has been used in more than 500 locations.

    Thanks,

    -Mohsin

    Then create a new iTVF UDF just for this task to start with. The current UDF is a scalar UDF and will always be a performance problem.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I agree with Jeff. Or for that matter since this is just in the select list, that scalar function can be turned into a case statement very easily.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply