iTVF: Better way to code this?

  • Scenario:

    Given value1, return value2 which can have three possible values.

    Doing this as a scalar function, it takes roughly 17 minites for approx 20,000 rows

    Doing this as an iTVF I wound up cancelling out at 1 hour 30 minutes.

    The problem may very well by my code. Here's a slightly bastardized version of it:

    Thought about maybe a outer join but I don't want to rely on NULLs for the first two case scenarios.

    CREATE FUNCTION [dbo].[fn_Get_NewValuetbl]

    (

    @OrigValue AS VARCHAR(20)

    )

    RETURNS TABLE

    -- WITH SCHEMABINDING

    AS

    RETURN

    ( SELECT CASE WHEN LEFT(RTRIM(LTRIM(@OrigValue)), 3) IN ( 'ABC', 'CDE',

    'GHI', 'JKL' )

    THEN RTRIM(LTRIM(@OrigValue))

    WHEN LEFT(RTRIM(LTRIM(@OrigValue)), 1) = 'R'

    THEN RTRIM(LTRIM(@OrigValue)) + 'xky'

    --Company 21 and 22

    ELSE LTRIM(RTRIM(u.NewValue))

    + SUBSTRING(RTRIM(LTRIM(@OrigValue)), 10,

    LEN(RTRIM(LTRIM(@OrigValue))))

    END AS NewValue

    FROM dbo.NewValue_Load AS u

    WHERE ( LEFT(@OrigValue, 9) = RTRIM(u.SourceValue)

    OR LEFT(RTRIM(LTRIM(@OrigValue)), 1) = 'R'

    OR LEFT(RTRIM(LTRIM(@OrigValue)), 3) IN ( 'ABC', 'CDE',

    'GHI', 'JKL' )

    )

    )

    Table is a simple lookup table:

    CREATE TABLE [dbo].[NewValue_Load](

    [SourceValue] [varchar](12) NULL,

    [NewValue] [nchar](11) NOT NULL

    ) ON [PRIMARY]

    GO

    CREATE CLUSTERED INDEX [IX_NewValue] ON [dbo].[NewValue_Load]

    (

    [NewValue] ASC

    )ON [PRIMARY]

  • You've been around long enough to know that you should provide sample data and expected results.

    It would also help if you provided the query where you are calling the function. The bottleneck may be there instead of the function itself.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • 17 minutes or 1.5 hours sounds a lot more like a performance issue rather than than an ITVF issue;

    consider updating statistics (UPDATE STATISTICS ON TBLNAME WITH FULLSCAN) for each of the tables featured in the query, look for inadvertant cross joins you might have overlooked, then post the actual execution plan and lets see if there's something that can be improved from that info.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • It's a simple string to string translation so I don't think sample data is relevant (especially since in this case it would have to be entirely faked)

    The surrounding query is also very much not the bottleneck. The function is the only new element to a query within a stored procedure that's been running just fine for a while now. Besides that, the outside code is beyond the ability to obfuscate since it contains much proprietary information and cannot change.

    My rather extensive testing has involved many combinations of queries and functions so I know this is the bottleneck.

    Not only that, this function is being used in multiple SP's (hence the whole need to make it a function) and the same behavior shows up in all of them.

  • Lowell:

    NewValue_Load is the only actual table in the query. That table is truncated and populated anew every day. I have several different flavors of queries that use the function in question. Each one displays the same behavior. One uses a CTE off a temp table to insert values into a persistent table. One flavor does an update on an existing persistent table, another flavor uses the function as part of a select from an openquery (pseudocode: insert into LoadTable select values, fn(SourceValue)) from openquery() )

    So I'm all over the place with the function and see the same behavior in all.

    Query olan for one of the SP's is totally uninformative.

  • Your function is doing a partial cross join. If you pass in the parameter "ABC" it will return "ABC" for every single value that you have in your new values table. The same is true if you pass in anything beginning with an "R". I highly doubt that is what you want.

    Also, it won't save you much time, but either do the RTRIM(LTRIM()) before passing the parameter into the function or set your parameter to the trimmed value as the first thing in the function so that you don't have to repeat those calls.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • You need to find a way to whack the rtrim here if you want any shot at an index seek.

    RTRIM(u.SourceValue)

    I also don't see how you can possibly need it since the column is already varchar.

    Make sure your where does a seek and you should be fine.

    Also make sure you don't return all rows from the new table (you ors can make that happen)

    OR LEFT(RTRIM(LTRIM(@OrigValue)), 1) = 'R'

    Also I'm not sure about this, but it might be simpler to just trim the value before sending it has parameter... or only once in a local variable...

  • Drew:

    I wondered about that. Was the only way I could get that TVF to work, though.

    doing the trims outside the function isn't really a possibility. So how do I do it inside a TVF?

  • Never tried this (working nor perf).

    SET @Param = RTRIM(LTRIM(@Param))

    If you can't avoid the scan on the CI, you're dead at 90 minutes for each run.

  • Ninja's_RGR'us (9/20/2011)


    You need to find a way to whack the rtrim here if you want any shot at an index seek.

    RTRIM(u.SourceValue)

    I also don't see how you can possibly need it since the column is already varchar.

    Guh. I know this, really I do (forests and trees...)

    Leftover from when I was dealing with the source data directly (DB2 source) where all the values are padded with trailing spaces and nothing matches without trimming. I know just how to fix this one.

  • Try the following code based on your original. All I did was move the reference to the new value table into a correlated subquery.

    CREATE FUNCTION [dbo].[fn_Get_NewValuetbl]

    (

    @OrigValue AS VARCHAR(20)

    )

    RETURNS TABLE

    -- WITH SCHEMABINDING

    AS

    BEGIN

    SET @OrigValue = RTRIM(LTRIM(@OrigValue))

    RETURN

    ( SELECT CASE WHEN LEFT(@OrigValue, 3) IN ( 'ABC', 'CDE',

    'GHI', 'JKL' )

    THEN @OrigValue

    WHEN LEFT(@OrigValue, 1) = 'R'

    THEN @OrigValue + 'xky'

    --Company 21 and 22

    ELSE (SELECT LTRIM(RTRIM(u.NewValue))

    + SUBSTRING(RTRIM(LTRIM(@OrigValue)), 10,

    LEN(RTRIM(LTRIM(@OrigValue))))

    FROM dbo.NewValue_Load AS u

    WHERE ( LEFT(@OrigValue, 9) = RTRIM(u.SourceValue) )

    END AS NewValue

    )

    )

    This is completely untested.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Nice use of a short-circuit!

    Then you just have to find a way to remove the rtrim in the where clause and this then has a chance to fly!

  • Ninja's_RGR'us (9/20/2011)


    Nice use of a short-circuit!

    Then you just have to find a way to remove the rtrim in the where clause and this then has a chance to fly!

    Isn't the RTRIM superfluous in this case, anyhow?

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen (9/20/2011)


    Ninja's_RGR'us (9/20/2011)


    Nice use of a short-circuit!

    Then you just have to find a way to remove the rtrim in the where clause and this then has a chance to fly!

    Isn't the RTRIM superfluous in this case, anyhow?

    Drew

    Should be, but he said he couldn't make the code work without it so I'm drawing blanks at this point.

    I'd sure give this another go without it.

  • (umm, that would be "she")

    The original code wouldn't do it without a trim as I was dealing with DB2. I *do* have a workaround for the RTRIM now that I'm using a SQL Server lookup table. (I'm doing the trim before this code even thinks of looking at the data.)

    Drew: Your solution, unfortunately, didn't work. Seems like a select statement in a CASE isn't allowed.

    HOWEVER, I *did* find a very workable solution through you folks stirring up the brain cells a bit more.

    I created a multi-statement Table-Valued function and it works wonderfully. 19000 rows fully processed in 14 seconds on the first pass. This is much more better. 😀

    (and yes, all the TRIMs are overkill I'm sure but we've had major issues with this data...)

    CREATE FUNCTION [dbo].[fn_Get_NewValuetbl_msf]

    (

    @OrigValue AS VARCHAR(20)

    )

    RETURNS @The_NewValue TABLE ( NewValue VARCHAR(20) )

    AS

    BEGIN

    IF LEFT(RTRIM(LTRIM(@OrigValue)), 3) IN ( 'ABC', 'CDE', 'GHI', 'JKL' )

    INSERT INTO @The_NewValue

    ( NewValue )

    VALUES ( RTRIM(LTRIM(@OrigValue)) )

    ELSE

    IF LEFT(RTRIM(LTRIM(@OrigValue)), 1) = 'R'

    INSERT INTO @The_NewValue

    ( NewValue )

    VALUES ( RTRIM(LTRIM(@OrigValue)) + 'xky' )

    --Company 21 and 22

    ELSE

    INSERT INTO @The_NewValue

    ( NewValue

    )

    SELECT LTRIM(RTRIM(u.NewValue))

    + SUBSTRING(RTRIM(LTRIM(@OrigValue)), 10,

    LEN(RTRIM(LTRIM(@OrigValue)))) AS NewValue

    FROM dbo.NewValue_Load AS u

    WHERE LEFT(@OrigValue, 9) = u.SourceValue

    RETURN

    END

    So thanks for the brain refresh and the reminders.

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply