Performance of table-valued functions

  • cmartel 20772

    SSC Eights!

    Points: 952

    As I am getting more experienced with CLR, I tend to experiment with more advanced concepts. My latest test involves a CLR table-valued function that I call using an OUTER APPLY. The function returns a table containing the ISO codes and names of various countries and is used like:

    SELECT
    A.Street,
    A.City,
    A.CountryCode,
    C.Name
    FROM Address A
    OUTER APPLY (SELECT Name FROM dbo.CLRGetCountries() WHERE IsoCode = A.CountryCode) C

    Since it was working nice and fast, I added state codes to it so my statement became:

    SELECT
    A.Street,
    A.City,
    A.CountryCode,
    C.Name,
    A.StateCode,
    S.Name
    FROM Address A
    OUTER APPLY (SELECT Name FROM dbo.CLRGetCountries() WHERE IsoCode = A.CountryCode) C
    OUTER APPLY (SELECT Name FROM dbo.CLRGetCountries() WHERE IsoCode = A.StateCode) S

    The statement still performed very well until I simply added "ORDER BY A.City" to it. While it was running under 3 seconds on a 200,000 records database, it took over 4 minutes to execute with the ORDER BY clause. Adding an ORDER BY clause to the first statement (the one with only one OUTER APPLY) had no impact; it remained fast.

    I know that I do not have to use a CLR function in this example. Everything works fast and well if I OUTER APPLY on a real or temporary table, for example. This is not the point. I am interested to know if someone else ever encountered such behavior and if it can be overcome.

    Regards

  • Solomon Rutzky

    SSCoach

    Points: 16135

    Are you really using the same SQLCLR TVF for countries and states, or is that just a copy/paste issue when typing up the question? SQLCLR TVFs always report to the Query Optimizer that they will be returning 1000 rows, so that sometimes messes up estimations. Not sure specifically what would be going on without seeing an execution plan, though.

     

    One thing to try, just in case it is the magic 1000 row estimated rows, is to add a TOP(x) to each of the queries within the OUTER APPLY (i.e. "SELECT TOP(1) [Name] FROM dbo.CLRGetCountries()").

    Another approach is to pass in the ISO code to get only 1 row back. That is what I did in my SQL# project for the LookUp_GetCountryInfo() and LookUp_GetStateInfo() TVFs (both of which are available in the Free version). But, you can also pass in empty string or NULL to get the full list.

    SELECT * FROM SQL#.LookUp_GetCountryInfo(N''); -- returns 244 rows

    SELECT * FROM SQL#.LookUp_GetCountryInfo(N'us'); -- returns 1 row

    Then you can just OUTER APPLY the function itself:

    OUTER APPLY SQL#.LookUp_GetCountryInfo(A.[CountryCode]) C

     

    Take care,

    Solomon...

     

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • cmartel 20772

    SSC Eights!

    Points: 952

    Thanks Solomon,

    I am using the exact same query. Having two distinct queries for states and countries is not an effective long-term solution since I may need to join multiple times on states or countries.

    Using TOP(1) did not help; it actually made things worse as the statement always performs slow, even without the ORDER BY clause!

    The execution plan of the slow query has a warning that says: "Type conversion ... may affect cardinality estimate in query plan choice". This is because my Address table has Ids that are CHAR() while CLR functions must return NCHAR(). I therefore changed the type of Address.CountryID and Address.StateID to NCHAR(43) and voilà!, everything is fast again.

    I also tried replacing OUTER APPLY with LEFT JOIN and forced LOOP, HASH and MERGE joins. It turned out that the plan used LOOP joins. HASH and MERGE joins are fast (but not as fast as using a NCHAR column).

    Thanks again.

  • Jeff Moden

    SSC Guru

    Points: 994858

    I don't understand why anyone would use a CLR function for such a thing to begin with.  Is the CLR getting the ISO data from some URL or what?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems
    Create a Tally Function (fnTally)

  • cmartel 20772

    SSC Eights!

    Points: 952

    Jeff,

    You do not understand and that is OK. I clearly stated in my initial post that CLR neither optimal nor required in my example. I am simply experimenting with CLR and I feel that it helps improve my SQL skills.

    As it is often the case, I posted an over-simplified version of my statement. The real CLRGetCountries function receives a language as a parameter. Non-US developers are most often developing multi-lingual applications and this is a reality that is foreign to many people.

    My CLR assembly contains a static class that is filled once. CLRGetCountries returns the IDs and Description in the desired language. While table Country currently contains one column per supported language (with the obligation to frequently create new columns), I am testing various alternatives, and CLR is only one of them, and I will most likely not use it.

    Regards

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply