udf with tally table runs slower than while loop

  • Hi, I'm migrating electronic records from a legacy system and the new system has strict requirements for ASCII characters in certain metadata fields. I wrote a UDF to display illegal characters, so I can work out how to map them.

    The UDF used a while loop and to improve performance, I wrote the equivalent UDF using a tally table. The tally table version actually ran significantly slower. Query calling UDF using cross apply took 26 secsfor the while loop versus 119 secs for Tally table, for test data of 97000 rows

    I would like to work out why, as I will use similar code to replace the illegal characters

    -- while loop version of UDF

    CREATE FUNCTION [dbo].[DisplayIllegalChars](@strText VARCHAR(4000))

    RETURNS @TableVariable TABLE (

    Chr CHAR(1)

    ,AsciiValue INT)

    AS

    BEGIN

    DECLARE @intCount INT

    DECLARE @chrCheck CHAR

    SET @intCount = 1

    WHILE @intCount <= LEN(@strText)

    BEGIN

    SET @chrCheck = SUBSTRING(@strText, @intCount, 1)

    IF (ASCII(@chrCheck) > 126)

    OR ((ASCII(@chrCheck) < 31) AND NOT ASCII(@chrCheck) IN (9, 10, 13))

    INSERT

    INTO @TableVariable (Chr, AsciiValue)

    SELECT @chrCheck, ASCII(@chrCheck)

    SET @intCount = @intCount + 1

    END

    RETURN

    END

    -- tally table, per Jeff Moden article

    SELECT TOP 4000

    IDENTITY(INT,1,1) AS N

    INTO dbo.Tally

    FROM Master.dbo.SysColumns sc1,

    Master.dbo.SysColumns sc2

    -- tally table version of UDF (could be inline function but want to compare like for like)

    CREATE FUNCTION [dbo].[DisplayIllegalChars2](@strText VARCHAR(4000))

    RETURNS @TableVariable TABLE (

    Chr CHAR(1)

    ,AsciiValue INT)

    AS

    BEGIN

    ;with CTE AS (

    SELECT SUBSTRING(@strText, N, 1) AS Chr

    FROM dbo.Tally)

    INSERT INTO @TableVariable (Chr, AsciiValue)

    SELECT Chr, ASCII(Chr)

    FROM CTE

    WHERE (ASCII(Chr) > 126)

    OR ((ASCII(Chr) < 31) AND NOT ASCII(Chr) IN (9, 10, 13))

    RETURN

    END

    -- ColumnText contains characters to be checked

    CREATE TABLE [dbo].[TestData](

    [ID Key] [varchar](255) NULL,

    [RecordType] [varchar](256) NOT NULL,

    [ColumnType] [varchar](256) NOT NULL,

    [ColumnText] [varchar](4000) NULL

    ) ON [PRIMARY]

    -- for each illegal character, display original text and the illegal characters

    -- whle loop UDF took 26 secs

    select [id key]

    ,RecordType

    ,ColumnType

    ,ColumnText

    from TestData t

    cross apply [dbo].[DisplayIllegalChars](t.ColumnText) i

    -- tally table USD took 119 secs

    select [id key]

    ,RecordType

    ,ColumnType

    ,ColumnText

    from TestData t

    cross apply [dbo].[DisplayIllegalChars2](t.ColumnText) i

  • There are three key problems with the way you have structured the tally table version.

    Firstly, your tally table should have a unique clustered index on n, or define it as a primary key in the ddl.

    Secondly, you aren't restricting the number of rows pulled from the tally table. You're using the full 4000 with every call.

    Thirdly, your function, a multi-statement table-valued function, will be much faster if it's written as an inline table-valued function, like this:

    CREATE FUNCTION [dbo].[iTVF_DisplayIllegalChars2]

    (@strText VARCHAR(4000))

    RETURNS TABLE AS

    RETURN (

    SELECT

    Chr = SUBSTRING(@strText, N, 1),

    AsciiValue = ASCII(SUBSTRING(@strText, N, 1))

    FROM dbo.Tally

    WHERE n <= LEN(@strText)

    AND (

    (ASCII(Chr) > 126)

    OR ((ASCII(Chr) < 31) AND NOT ASCII(Chr) IN (9, 10, 13))

    )

    )

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Hi Chris, I tried your suggestions,

    1) clustered index on tally table didn't make any difference, still 120 secs on test data

    2) using LEN(@vstrText) < N made huge difference, down to 15 secs, thanks! Makes sense that this is so signficant

    3) using a multi-statement table-valued function, this one I was aware of and was keeping as multi-statement to compare like for like, as the while loop version has to be multi-statement. For the record, using inline function brought it down to 5 secs.

    thanks for your help πŸ™‚

    Josh

  • Josh Leane-155117 (8/24/2015)


    Hi Chris, I tried your suggestions,

    1) clustered index on tally table didn't make any difference, still 120 secs on test data

    2) using LEN(@vstrText) < N made huge difference, down to 15 secs, thanks! Makes sense that this is so signficant

    3) using a multi-statement table-valued function, this one I was aware of and was keeping as multi-statement to compare like for like, as the while loop version has to be multi-statement. For the record, using inline function brought it down to 5 secs.

    thanks for your help πŸ™‚

    Josh

    Seems a little slow. Here's a tested version this time, with two changes which may be significant.

    Firstly, the tally table is inline, secondly, I've put in a couple of LIKE comparisons.

    From your point of view, the inline tally table may be worth a shot vs the hard table.

    ALTER FUNCTION [dbo].[iTVF_DisplayIllegalChars2]

    (@strText VARCHAR(4000))

    RETURNS TABLE AS

    RETURN (

    WITH

    E1(n) AS (SELECT 1 FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d (n)), --10E+1 or 10 rows

    E2(n) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows

    Tally(n) AS (SELECT TOP (LEN(@strText)) n = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E2 a, E2 b)

    SELECT

    x.Chr,

    AsciiValue = ASCII(x.Chr)

    FROM Tally

    CROSS APPLY (SELECT Chr = SUBSTRING(@strText, CAST(N AS INT), 1)) x

    WHERE 1 = 1

    AND x.Chr NOT LIKE '[A-Z]'

    AND x.Chr NOT LIKE '[0-9]'

    AND (

    (ASCII(x.Chr) > 126)

    OR

    ((ASCII(x.Chr) < 31) AND NOT ASCII(x.Chr) IN (9, 10, 13))

    )

    )

    GO

    SELECT * FROM dbo.iTVF_DisplayIllegalChars2 ('JND\VC879Y2`' + CHAR(8) + '34HMBzsmfcB`2' + CHAR(127) + '13')

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Josh Leane-155117 (8/24/2015)


    Hi Chris, I tried your suggestions,

    1) clustered index on tally table didn't make any difference, still 120 secs on test data

    2) using LEN(@vstrText) < N made huge difference, down to 15 secs, thanks! Makes sense that this is so signficant

    3) using a multi-statement table-valued function, this one I was aware of and was keeping as multi-statement to compare like for like, as the while loop version has to be multi-statement. For the record, using inline function brought it down to 5 secs.

    thanks for your help πŸ™‚

    Josh

    Just to add to the mix, did you just create a clustered index or did you create the required UNIQUE clustered index?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks for the additional suggestions Chris. I tried with the Tally table in the function, that slowed it slightly from 5 secs to 7 secs. Presumably due to the tally table being created each time?

    With the additional LIKE comparisons, it was slower again, 19 secs

    ChrisM@Work (8/24/2015)

    WHERE 1 = 1

    AND x.Chr NOT LIKE '[A-Z]'

    AND x.Chr NOT LIKE '[0-9]'

    Also, whats the reason behind the "1 = 1" comparision? Does that force a certain evaluation order for the other comparisons in the where clause?

    Using the cross apply to alias the "SELECT Chr = SUBSTRING(@strText, CAST(N AS INT), 1)" is a useful trick, thanks

    Josh

  • Josh Leane-155117 (8/25/2015)


    Thanks for the additional suggestions Chris. I tried with the Tally table in the function, that slowed it slightly from 5 secs to 7 secs. Presumably due to the tally table being created each time?

    With the additional LIKE comparisons, it was slower again, 19 secs

    ChrisM@Work (8/24/2015)

    WHERE 1 = 1

    AND x.Chr NOT LIKE '[A-Z]'

    AND x.Chr NOT LIKE '[0-9]'

    Also, whats the reason behind the "1 = 1" comparision? Does that force a certain evaluation order for the other comparisons in the where clause?

    Using the cross apply to alias the "SELECT Chr = SUBSTRING(@strText, CAST(N AS INT), 1)" is a useful trick, thanks

    Josh

    Hi Josh

    IIRC a hard tally table is usually slightly faster than an inline one. Google will tell you for sure.

    WHERE 1 = 1 is only a placeholder, no tricky stuff, the optimiser ignores it and so can you. SQL Server evaluates the filters in whatever order results in the lowest calculated cost in the chosen plan. Have you ever tried to cast a varchar to a date, using ISDATE() to check if the value can be cast, only to find that the cast is performed before the check?

    Did you create the clustered index as unique as Jeff mentions above?

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Jeff Moden (8/24/2015)


    Josh Leane-155117 (8/24/2015)


    Hi Chris, I tried your suggestions,

    1) clustered index on tally table didn't make any difference, still 120 secs on test data

    2) using LEN(@vstrText) < N made huge difference, down to 15 secs, thanks! Makes sense that this is so signficant

    3) using a multi-statement table-valued function, this one I was aware of and was keeping as multi-statement to compare like for like, as the while loop version has to be multi-statement. For the record, using inline function brought it down to 5 secs.

    thanks for your help πŸ™‚

    Josh

    Just to add to the mix, did you just create a clustered index or did you create the required UNIQUE clustered index?

    Hi Jeff, I did only have a clustered index. When I originally said the clustered index made no difference, that was against my original tally table, which was using the 4000 rows each time.

    I retested with Chris' solution, using LEN(@strText) rows from the tally table, and having the clustered index did make as difference of about 40% improvement. But it didn't matter if it was unique or not, though there might be a difference with a larger dataset.

    thanks for your help

  • Hi Josh

    IIRC a hard tally table is usually slightly faster than an inline one. Google will tell you for sure.

    WHERE 1 = 1 is only a placeholder, no tricky stuff, the optimiser ignores it and so can you. SQL Server evaluates the filters in whatever order results in the lowest calculated cost in the chosen plan. Have you ever tried to cast a varchar to a date, using ISDATE() to check if the value can be cast, only to find that the cast is performed before the check?

    Did you create the clustered index as unique as Jeff mentions above?

    Hey Chris, did try with the unique clustered index and posted a comment in reply to Jeff. I wasn't aware that the optimiser could change the order of the cast and isdate() check, good to know πŸ™‚

    thanks for all your help, appreciate it

    Josh

  • ChrisM@Work (8/25/2015)


    Hi Josh

    IIRC a hard tally table is usually slightly faster than an inline one. Google will tell you for sure.

    No need to Google. Here's a chart from the "Hidden RBAR" article at http://www.sqlservercentral.com/articles/T-SQL/74118/ .

    Note that use of a hardcoded Tally Table can cause a huge number of logical reads from memory once the relatively small physical Tally Table has been cached. The advantage of Itzik's method is that it generates absolutely no reads by itself.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (8/25/2015)


    ChrisM@Work (8/25/2015)


    Hi Josh

    IIRC a hard tally table is usually slightly faster than an inline one. Google will tell you for sure.

    No need to Google. Here's a chart from the "Hidden RBAR" article at http://www.sqlservercentral.com/articles/T-SQL/74118/ .

    Note that use of a hardcoded Tally Table can cause a huge number of logical reads from memory once the relatively small physical Tally Table has been cached. The advantage of Itzik's method is that it generates absolutely no reads by itself.

    thanks Jeff πŸ™‚ I'll definitely read that article

    Josh

  • Jeff Moden (8/25/2015)


    ChrisM@Work (8/25/2015)


    Hi Josh

    IIRC a hard tally table is usually slightly faster than an inline one. Google will tell you for sure.

    No need to Google. Here's a chart from the "Hidden RBAR" article at http://www.sqlservercentral.com/articles/T-SQL/74118/ .

    Note that use of a hardcoded Tally Table can cause a huge number of logical reads from memory once the relatively small physical Tally Table has been cached. The advantage of Itzik's method is that it generates absolutely no reads by itself.

    Thanks Jeff, I knew it was out there somewhere.

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • You bet. It's one of my favorite charts because of the line for the rCTE. Absolutely amazing how comparatively slow those bad boys can be.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply