Long execution times when including simple SQL functions into a join

  • Hi Everyone,

    We are running SQL2005 with the latest updates etc. We do have a separate drive RAID 0 from the TEMP DB. We routinely create joins linking 2-5 tables with 50-250mln records and apply all the proper indexing, make tall skinny tables to improve performance.

    We also have a couple of relatively simple functions that either verify parts of the data of create keys out of multiple columns. When we employ the functions on the same query to assemble results tables queries that normally took ~20hrs suddenly takes 2-3 days or never finish. We tried both SELECT INTO as well as INSERT INTO SELECT type queries and there is no difference.

    Why could that be? I can run a top 100 on the queries with or without the SQL-functions.

    Thanks!

  • Once I had a a very similar issue after applying a service pack. After rebuilding indexes and updating statistics all returned to the habitual rates...

  • By including functions in a query add overhead.

    Picture the scenario below for a 10M rows myTable table...

    select column-a

    from MyTable

    Query does a full table scan on 10M rows MyTable, isn't it?

    Now lets add a custom function to the very same query...

    select do_something(column-a)

    from MyTable

    What happens now?

    On top of the full table scan on 10M rows MyTable SQL Server has to execute do_something() function 10M times -that's overhead.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • I will not dispute the overhead, but is there a way to quantify the overhead?

  • Can you post the function, maybe making it Inline might help performance, if it's not already an Inline function. At least, it will get inserted in the execution plan, and the optimiser will be able to calculate statistics on it.

    Post the code for the said function, and I'll try to help.

    Cheers,

    J-F

  • Here you go:

    CREATE FUNCTION [dbo].[Valid_Phone]

    (@inputstring varchar(8000) )

    RETURNS varchar(8000)

    AS

    BEGIN

    /*return valid phone or no space blank*/

    declare

    @outputstring as varchar(8000)

    set @outputstring = ''

    -- Take out punctuations ()-.

    set @inputstring = replace(@inputstring,'(','')

    set @inputstring = replace(@inputstring,')','')

    set @inputstring = replace(@inputstring,'-','')

    set @inputstring = replace(@inputstring,'.','')

    set @inputstring = replace(@inputstring,' ','')

    --if length = 11 and start with 1 then trim the 1 at the 1st byte

    if len(@inputstring) = 11 and left(@inputstring,1) = '1'

    begin

    set @inputstring = substring(@inputstring,2,10)

    end

    --if length = 10, is numeric and valid area code then output phone number

    if len(@inputstring) = 10

    begin

    if isnumeric(@inputstring)=1

    and left(@inputstring,1) not in ('0','1')

    and substring(@inputstring,2,1) != '9'

    and substring(@inputstring,2,2) != '11'

    and substring(@inputstring,4,1) not in ('0','1')

    and substring(@inputstring,5,2) != '11'

    and left(@inputstring,3) not in

    --(select AreaCode from dbo.FPDW_AreaCode where Filter = 'Y')

    ('456', '500', '555', '600', '700', '710', '800', '822', '833', '844', '855', '866', '877', '888', '898', '900', '976', '999')

    -- check last 7 bytes

    and right(@inputstring,7) not in ('2222222','3333333','4444444','5555555','6666666','7777777','8888888','9999999')

    begin

    set @outputstring = @inputstring

    end

    end

    return @outputstring

    END

  • Came across this article that wants me to believe Scalar functions are not a great idea to begin with, not sure I am buying into it completely.

    http://sqlblogcasts.com/blogs/simons/archive/2008/11/03/TSQL-Scalar-functions-are-evil-.aspx

  • Hi Pieter

    Some questions:

    1.) Do you call these functions in your WHERE clause?

    2.) If yes, are the initial replacements "()-. " needed? They may be a very huge problem for another solution I'm currently thinking about.

    3.) Could CLR be an option?

    4.) Can you provide some pseudo sample data which should work and which should not?

    Greets

    Flo

    Edited: Typo

  • If you use functions in the where clause, whether user defined or system functions, then the optimizer is unaware of the value and has to determine it on each row, if you need to use functions to massage the data used in the where clause, could you not create computed columns using the functions and then join on these?

    Good article explaining what I beleieve you are describing:

    http://www.mssqltips.com/tip.asp?tip=1236

  • Please post your query that uses this function.

  • You can quantify the differences. Get the execution plans first. That will show you where the changes are occurring between the execution without the function and the one with it. You can also collect STATISTICS IO to see the scans & reads change between the two executions. That should be enough to quantify exactly what's going on. And yes, scalar functions can (not necessarily will) cause your queries to perform badly, especially if those scalar functions also have data access within them (which your example did not). It's my understanding, I haven't tested this, so I'm just repeating stuff I've read, that string manipulation like this is better done using a CLR function.

    If the functions are being used in the WHERE clause of the query, the function will prevent index use.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • The TSQL functions are in the select portion not the Where clause or Join conditions.

    CLR is an option and I tried to convert that phone function. However, as unexpected as it was, there was no performance improvement.

  • Would you please post your query.

  • Pieter (5/20/2009)


    The TSQL functions are in the select portion not the Where clause or Join conditions.

    CLR is an option and I tried to convert that phone function. However, as unexpected as it was, there was no performance improvement.

    Hi Pieter,

    I've built this query to replace your scalar function, would you mind testing it on your data, and post the results/Query plan?

    Hope it will help,

    CREATE FUNCTION [dbo].[Valid_Phone_Inline]

    (@inputstring VARCHAR(8000))

    RETURNS Table

    AS

    RETURN(

    with PhoneToValidate(TrimmedPhone)

    as (SELECT CASE

    WHEN Len(TrimmedPhone) = 11

    AND left(TrimmedPhone,1) = '1'

    THEN Substring(TrimmedPhone,2,10)

    ELSE TrimmedPhone

    END AS TrimmedPhone

    FROM (SELECT replace(replace(replace(replace(replace(@inputstring,'(',''),')',''),

    '-',''),'.',''),' ','') AS TrimmedPhone) AS a)

    Select case when IsNumeric(TrimmedPhone) = 1

    AND left(TrimmedPhone,1) not IN ('0','1')

    AND substring(TrimmedPhone,2,1) != '9'

    AND substring(TrimmedPhone,2,2) != '11'

    AND substring(TrimmedPhone,4,1) NOT IN ('0','1')

    AND substring(TrimmedPhone,5,2) != '11'

    AND left(TrimmedPhone,3) NOT IN -- (select AreaCode from dbo.FPDW_AreaCode where Filter = 'Y')

    ('456','500','555','600',

    '700','710','800','822',

    '833','844','855','866',

    '877','888','898','900',

    '976','999')

    AND right(TrimmedPhone,7) NOT IN ('2222222','3333333','4444444','5555555',

    '6666666','7777777','8888888','9999999')

    THEN TrimmedPhone

    ELSE '' END as OutputPhone

    FROM PhoneToValidate);

    GO

    SELECT dbo.Valid_Phone('1(700) 226-2354')

    SELECT OutputPhone

    FROM dbo.Valid_Phone_Inline('1(700) 226-2354')

    SELECT OutputPhone

    FROM dbo.Valid_Phone_Inline('1(226) 226-2354')

    SELECT OutputPhone

    FROM dbo.Valid_Phone_Inline('0(534) 226-2354')

    SELECT OutputPhone

    FROM dbo.Valid_Phone_Inline('1(534) 226-2354')

    Cheers,

    J-F

  • Hi Pieter

    I just tried some CLR stuff with 1 million rows.

    Currently your function takes 17s

    CLR takes 7s

    I think this can be faster. Give me a little bit more time.

    Questions:

    1.)

    Are you using C# or VB.Net?

    2.)

    Why do you use this as on-demand function? I don't think you get 250mio new phone numbers every day. What about a new column in your table which contains the cleaned phone numbers?

    @J-F:

    I stopped your function at two minutes.

    Greets

    Flo

Viewing 15 posts - 1 through 15 (of 41 total)

You must be logged in to reply to this topic. Login to reply