Looking for some guidance on avoiding Scalar UDFs for DB lookups

  • Hello.

    I am not a SQL Developer - just an IT generalist (if that) who dips in and out of SQL Server

    stuff and even writing this down helps me to better understand the 'problem'.

    From reading, the perceived wisdom is that you should not do DB lookups in Scalar functions for performance reasons.

    I am looking at some code for a job that used to run for 13 hours until it got canned, due to the fact that the results were incorrect as much as the runtime (it was pointless).

    The job used a Scalar function to perform multiple DB lookups (10's of millions per hour).

    I now want to do something similar to what this job did but avoid the pitfalls of doing the lookups in a Scalar function.

    I see that 2019 implicitly converts to inline TVFs - and in fact I tried converting the code myself  but think it will face the same issues with being non-performant, so did not pursue this (as I never actually got it working).

    What the Scalar function is doing is looking up a price for an item based on a hierarchy, and if the result is NULL then and only then performing the next lookup in the hierarchy for the item.

    A scaled down version of the  correct hierarchical lookup looks like this :

    A) Specific Item Price for a Customer

    B)  Specific Item Parent Item Price for a Customer

    C) Generic Item Price for a Customer

    D) Specific Item Price for all Customers

    E) Specific Item Parent Price for all Customers

    F) Generic Item Price for all Customers

    The exiting code is similar, potentially performing 6 lookups - but using a different (incorrect) hierarchy.

    Pricing should always exist at D, E and F - this is Base pricing - primarily to trap the non-existence of A, B,C

    which is an error.

    In most circumstances the pricing exists at A, but some prices exist at B and C, which is valid and needs to

    be catered for.

    I can see why a Scalar function might have looked like the answer from a coding perspective as it simplifies the SQL query - at the expense of performance.

    The DB contains 135 million items - there are some predicates that reduce the input to the function perhaps by half - but that still means somewhere from 75 - 450 million lookups required to me.

    All I have been able to come up with to date (conceptually) is to pull all itemids with customer ids and prices into a Temp table and add a derived from column to say where in the hierarchy the price came from - applying a value or weight. Then sort them by customerid/hierarchy and somehow filter by the row that applies based on the hierarchy - maybe a 2nd Temp table (or some kind of nested Selects to do both).

    Not looking for somebody to solve this for me, but rather looking for some guidance to develop my SQL coding.

    Based on  what I am trying to achieve and knowing (?) that  Scalar function with multiple DB lookups are the wrong answer - what pointers can anybody offer as to what the right answer to how to do something like this is?

    I think my proposed method will reduce the runtime from 13 hours to n minutes but still may not be the best way to do this even if it works and this requirement for an iterative lookup must be common.

    Thanks in advance for any input.

  • I think that knowing why and how that function is called so many times may help pointing you to a better solution.

    how all those hierarchical rules are applied may also change how this can be implemented

    below is a possible approach style - not necessarily the best or fastest

    select ...
    , coalesce(cust_prices.price, all_prices.price) as price

    from master_table mt
    outer apply (select top 1 case
    when p.pricetype = 'specific'
    then p.price
    when parent.pricetype is not null
    and parent.pricetype = 'specific'
    then parent.price
    else p.price
    end
    from prices p
    left outer join prices parent
    on parent.priceid = p.parentid
    and parent.customerid = mt.customerid
    where p.customerid = mt.customerid
    and p.itemid = mt.itemid
    order by case
    when p.pricetype = 'specific'
    then 1
    when parent.pricetype is not null
    and parent.pricetype = 'specific'
    then 2
    else 3
    end
    ) cust_prices
    outer apply (select top 1 case
    when p.pricetype = 'specific'
    then p.price
    when parent.pricetype is not null
    and parent.pricetype = 'specific'
    then parent.price
    else p.price
    end
    from prices p
    left outer join prices parent
    on parent.priceid = p.parentid
    and parent.customerid is null -- how do all customers get flagged
    where p.customerid is null -- how do all customers get flagged?
    and p.itemid = mt.itemid
    order by case
    when p.pricetype = 'specific'
    then 1
    when parent.pricetype is not null
    and parent.pricetype = 'specific'
    then 2
    else 3
    end
    ) all_prices

     

  • Thanks Frederico for the quick reply. I will work though that example to understand what it is doing (Not used Outer Apply to date). I think that the Top 1 may replace the idea of sorting /  assigning a weight but just a guess. I will attempt to construct the DDL and show the data when I can figure that out.

    At this point I am running some queries to identify the scale of the issue where an exact match on object/customer/price does not exist and try to break it down to verify results as I go. My first query is still running after 43  minutes - the other 2 methods return the same results after 10 seconds each. I am going to let the first query finish to see if it returns the same results.

  • Are Items A thru F all in a single parent-child table?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • We need to see the actual lookup code to offer more specific advice here.

    Proper indexing could also be critical to getting best performance for this task.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Hello Jeff.

    Thanks for the reply and aplogogies for not replying sooner - my account is not linked to my work email and not checked my personal emails for a while - just came back here in downtime to review Fredico's code.

    The 'Price' table has links to an Object Table and the actual Monetary Value is stored in another PriceDetail Table so currently not in the same table as such but I had complemplated pulling the data into 1 Temp table and  passing that to a function.

    If I can get my head around this myself and what I am actually trying to do, and work out how,  I will post som DDL.

    Thanks

    Steve O.

  • Hello Scott.

    Thanks for the reply.

    I have already determined that iterative DB lookups in a Scalar Function is not the way to do this and, so am looking at starting from scratch with the requirements so I did not see much point posting the current function and the DDL used.

    Once I have something that works I can look at indexing and execution plans but at this stage I am looking to understand the proper way to do this if not using a scalar function.

    If you think that there is still value in seeing the scalar function I can try to have a look for it and post it?

    Steve O.

  • Hello again Jeff.

    Re-reading your reply and perhaps I misunderstood the question.

    There is 1 Price Table containing all Price records and there is 1 Price Detail Table containing all monetary values. I guess they are separate for a reason (probably because there can be a one to many realtionship between a price record and monetary value with concept of current price and dates from which the price applies).

    For any customer, any of A-C may or may not exist - but D-F should exist (there are other checks for that).

    What this code is trying to do is identify instances where all of A-C do not exist as in most instances (but not all) this is an error state and will be highlighted by the monetary values returned for D-F being abnormally high.

    Thanks

    Steve O.

  • I'm thinking that I'm not understanding much either. 😀

    You probably don't actually mean "Hierarchical"... you just mean you have some relational tables, correct?  If that's the case, and ERD might help a bit here.

    Have you tried working with Frederico's code at all?  How'd that work out?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • As a general rule of thumb, you should not write any scalar valued functions. At all.

    Even tough SQL2019 and up are evolving in this troublesome area, just don't use SVFs.

    Even returning a single value as a SET, will enhance performance in most of the cases. (cross apply / outer apply are your friends)

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Hello Jeff.

    Thanks for hanging in there. I have a meeting tomorrow to discuss my thoughts on this with peers.

    However, your sig has always been something that was at the back of my mind (I was not aware it was yours) and what I determined was that I don't actually need to check 135 million entries. I ran a quick query to group by accountid and objectid and all I need to do is use the output from that to check for pricing - amounts to 65K rows before I add predicates.

    At this stage I don't even need the monetary value - all I need to to know there is a price - even if set to £0

    It doesn't matter if there are 1 or a million hits - the same rule will apply to all and the check is only requried once per account in the hierarchy.

    Not sure if that is what your sig means, but it prompted me to think of the above just the same so works for me.

    I have looked at whether i can use Frederico's code and still unclear - what I was doing last night was creating tmp tables with the fields I wanted in separate queries and trying to document the fields so I could generate the DLL statements as I was running test queries.

    I will update with progress.

    Regards

    Steve O.

  • Hello Johan.

    I have a couple of Scalar Functions that are sub-second.

    1 that I use every day passes an IP Address/Subnet in and returns a human friendly site name by scanning though the list of subnet lookups - does no DB lookups, it is all in the function (coded as and when I find New entries and nothing on TV).

    I probably have a couple of hundred subnets (we have VLANs for Data, Voice, Guest etc at each of dozens of sites and are in the process of migrating networks so lots of entries during transition) and pass a few thousand IP addresses to it when refreshing the query - and doing a bunch of joins to other data tables and as I said it runs in under 1 second in SSMS.

    Appreciate what you are saying, but works for me as is and may perform even better in 2019 (?).

    Regards

    Steve O.

     

     

  • SteveOC wrote:

    Hello Johan.

    I have a couple of Scalar Functions that are sub-second.

    1 that I use every day passes an IP Address/Subnet in and returns a human friendly site name by scanning though the list of subnet lookups - does no DB lookups, it is all in the function (coded as and when I find New entries and nothing on TV).

    I probably have a couple of hundred subnets (we have VLANs for Data, Voice, Guest etc at each of dozens of sites and are in the process of migrating networks so lots of entries during transition) and pass a few thousand IP addresses to it when refreshing the query - and doing a bunch of joins to other data tables and as I said it runs in under 1 second in SSMS.

    Appreciate what you are saying, but works for me as is and may perform even better in 2019 (?).

    Regards

    Steve O.

    Heh... all I can say there is "famous last words" and I'll leave it at that. 😀

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • If you have 200 entries, at least use a form of binary search rather than searching sequentially thru the list.  Even better would be just looking it up in a table clustered on the lookup value.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • SteveOC wrote:

    Hello Johan.

    I have a couple of Scalar Functions that are sub-second.

    1 that I use every day passes an IP Address/Subnet in and returns a human friendly site name by scanning though the list of subnet lookups - does no DB lookups, it is all in the function (coded as and when I find New entries and nothing on TV).

    I probably have a couple of hundred subnets (we have VLANs for Data, Voice, Guest etc at each of dozens of sites and are in the process of migrating networks so lots of entries during transition) and pass a few thousand IP addresses to it when refreshing the query - and doing a bunch of joins to other data tables and as I said it runs in under 1 second in SSMS.

    Appreciate what you are saying, but works for me as is and may perform even better in 2019 (?).

    Regards

    Steve O.

    We are currently experiencing HUGE cpu consumption due to SVF usage on a bit larger sets with SQLServer 2019 CU14/15/16.

    Here's a sample SVF ( noting to it, right ? )

    ( just don't ask "Why"!)

    CREATE FUNCTION [dbo].[F_ISWOSTATUSWAITPOSSIBLE] (
    @WO_StatusVARCHAR(2)
    )
    RETURNS BIT
    AS
    BEGIN
    DECLARE @ISWOSTATUSWAITPOSSIBLEBIT;
    SELECT @ISWOSTATUSWAITPOSSIBLE = (CASE WHEN @WO_Status in ('30', '40', '50', '60', '65', '70') THEN 1 ELSE 0 END);

    RETURN @ISWOSTATUSWAITPOSSIBLE
    END

    This function has these settings in sys.modules : is_inlineable = 1  / inline_type = 1

    Queries now hitting timeouts ! ( +30sec )

    Here's the inline table value function  that replaces it

    CREATE              FUNCTION [dbo].[tvF_ISWOSTATUSWAITPOSSIBLE_JoBi] (
    @WO_StatusVARCHAR(2)
    )
    RETURNS Table
    AS
    return (
    Select convert(bit, CASE WHEN @WO_Status in ('30', '40', '50', '60', '65', '70') THEN 1
    ELSE 0
    END) as ISWOSTATUSWAITPOSSIBLE
    )
    GO

    Same queries (only modified to use cross apply with the itvf ) now run very stable consuming < 100ms

    We have been warned ever since SQL2008 by MS to replace SVFs to (i)TVFs for performance reasons.

    So, for 14 years these functions have been consuming below any alert level, right?

    Now it is total panic to get this fixed by yesterday.

    Edited:

    We managed to turn things back to "normal", performing

    ALTER DATABASE SCOPED CONFIGURATION set LEGACY_CARDINALITY_ESTIMATION = ON  -- Default = OFF

    This should be avoided as much as possible !

    ( it's like driving a Ferrari but shutting off most of its marvels )

     

    • This reply was modified 1 year, 10 months ago by  Johan Bijnens.
    • This reply was modified 1 year, 10 months ago by  Johan Bijnens.

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply