I am not a SQL Developer - just an IT generalist (if that) who dips in and out of SQL Server
stuff and even writing this down helps me to better understand the 'problem'.
From reading, the perceived wisdom is that you should not do DB lookups in Scalar functions for performance reasons.
I am looking at some code for a job that used to run for 13 hours until it got canned, due to the fact that the results were incorrect as much as the runtime (it was pointless).
The job used a Scalar function to perform multiple DB lookups (10's of millions per hour).
I now want to do something similar to what this job did but avoid the pitfalls of doing the lookups in a Scalar function.
I see that 2019 implicitly converts to inline TVFs - and in fact I tried converting the code myself but think it will face the same issues with being non-performant, so did not pursue this (as I never actually got it working).
What the Scalar function is doing is looking up a price for an item based on a hierarchy, and if the result is NULL then and only then performing the next lookup in the hierarchy for the item.
A scaled down version of the correct hierarchical lookup looks like this :
A) Specific Item Price for a Customer
B) Specific Item Parent Item Price for a Customer
C) Generic Item Price for a Customer
D) Specific Item Price for all Customers
E) Specific Item Parent Price for all Customers
F) Generic Item Price for all Customers
The exiting code is similar, potentially performing 6 lookups - but using a different (incorrect) hierarchy.
Pricing should always exist at D, E and F - this is Base pricing - primarily to trap the non-existence of A, B,C
which is an error.
In most circumstances the pricing exists at A, but some prices exist at B and C, which is valid and needs to
be catered for.
I can see why a Scalar function might have looked like the answer from a coding perspective as it simplifies the SQL query - at the expense of performance.
The DB contains 135 million items - there are some predicates that reduce the input to the function perhaps by half - but that still means somewhere from 75 - 450 million lookups required to me.
All I have been able to come up with to date (conceptually) is to pull all itemids with customer ids and prices into a Temp table and add a derived from column to say where in the hierarchy the price came from - applying a value or weight. Then sort them by customerid/hierarchy and somehow filter by the row that applies based on the hierarchy - maybe a 2nd Temp table (or some kind of nested Selects to do both).
Not looking for somebody to solve this for me, but rather looking for some guidance to develop my SQL coding.
Based on what I am trying to achieve and knowing (?) that Scalar function with multiple DB lookups are the wrong answer - what pointers can anybody offer as to what the right answer to how to do something like this is?
I think my proposed method will reduce the runtime from 13 hours to n minutes but still may not be the best way to do this even if it works and this requirement for an iterative lookup must be common.
Thanks in advance for any input.