Optimiser estimating wrong number of actual rows.

  • Hello,

    I have a table valued function which works out the entities that a particular user does not have access to. It is joined to a view which produces search results for the interface.

    If the search results (before filtering restricted entities) return anywhere between 7K to 20k rows then the left outer join to TVF results in a significant performance drop. When I checked the query plans the function was contributing '1115167077' actual number of rows when in reality it has 97K rows.

    If I run the same query to return search results outside 7K-20K range then the left outer join contributes correct number of actual rows.

    This is my sql query

    The view returns 11K rows and then the result is filtered based on the rows in TVF.

    (Execution time 1 min 50 sec)

    select DISTINCT MATTER_DIWOR as 'Matter Diwor', MATTERCODE as 'Matter Code', Matter_Description as 'Matter Description' ,

    Matter_Status as 'Status' , client_name as 'Client Name' , Other_Reference as 'Other Ref'

    from VW_RET_MAT vm

    left outer join [dbo].lsfun_getaccessibleentities ('M', 195) ls on vm.MATTER_DIWOR = ls.ENTITY_DIWOR

    where MATTERCODE like '%m%'

    and ls.ENTITY_DIWOR is null

    order by 2

    when I change the search criteria to exceed the above specified range,

    This time view returns 33K rows and then results are filtered by the left join

    (Execution time 8 sec)

    select DISTINCT MATTER_DIWOR as 'Matter Diwor', MATTERCODE as 'Matter Code', Matter_Description as 'Matter Description' ,

    Matter_Status as 'Status' , client_name as 'Client Name' , Other_Reference as 'Other Ref'

    from VW_RET_MAT vm

    left outer join [dbo].lsfun_getaccessibleentities ('M', 195) ls on vm.MATTER_DIWOR = ls.ENTITY_DIWOR

    where MATTERCODE like '%a%'

    and ls.ENTITY_DIWOR is null

    order by 2

    Can somebody shed some light as to why it is happening?

    Any help will be appreciated:-)

  • Are your statistics up to date? You can check using sys.stats and STATS_DATE:

    SELECT

    T.[name] AS table_name,

    I.[object_id],

    I.[name] AS index_name,

    I.index_id,

    I.type_desc,

    I.fill_factor,

    S.[name] AS stats_name,

    S.stats_id,

    S.auto_created,

    S.user_created,

    S.no_recompute,

    STATS_DATE(S.[object_id], stats_id) AS STATS_DATE

    FROM

    sys.stats S JOIN

    sys.tables T

    ON S.[object_id] = T.[object_id] LEFT JOIN

    sys.indexes I

    ON S.[object_id] = I.[object_id] AND

    S.stats_id = I.index_id

  • Is it an inline or multi-value UDF?

    Multi-value UDFs are treated like table variables by the database engine, which means it estimates one row per call, regardless of how many it will actually be. That sounds like what you're running into.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Estimated rows are just that... estimated. From what I've seen, they're rarely correct and there's not much you can do about it if your stats are actually up to date.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Those are not estimated rows. When you open a query plan and hover over the arrows it actually shows you Actul and Estimated row contribution of every branch.

    Estimated rows are correctly displayed as 1 as it is a multiline function returning table variable but the actual number of rows jump to billion when TVf is getting joined with a result set containing the rows in the rang of 7k-20k. Outside of that range that figure of actual rows actually displays the correct number of rows in the TVF.

    It is really baffling as it is sort of a search hotspot for my system. I have tested that function when it returns 120k rows and if my search result set falls out of the above said range then the query execution time is 16 sec. But if my search results from the view fall in the range then even if the function is returning 10 rows it takes close to a minute to execute the query.

  • Actually, that kind of makes sense. It might be the difference between join mechanics. Hash Match can be quite fast, and might end up being used on large numbers of rows.

    Is the UDF something that can be replaced by an inline query? Perhaps with Cross Apply? Or maybe have its results dumped into a temp table or table variable at the beginning of the proc and just run it once and then join to that?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Rahul N (8/6/2009)


    Those are not estimated rows. When you open a query plan and hover over the arrows it actually shows you Actul and Estimated row contribution of every branch.

    Heh... yep, I know that. But, it's real hard to see what you've done from here and many folks confuse the two because they don't think that estimated rows will be shown on an actual execution plan for some reason. Bad assumption on my part about what you meant. Sorry.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Rahul,

    The explanation will be an easy one - we just need you to save an actual execution plan (right-click, Save Execution Plan As...) then zip it up and attach it to a post. It needs to be zipped because this site will not accept *.sqlplan attachments directly.

    (My crystal ball predicts a loop join with a table spool.)

    Paul

Viewing 8 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply