Can I push an Index hint down into a base tables from a view

  • Ken Gaul

    Ten Centuries

    Points: 1362

    I have a problem where the optimiser chooses a poor plan. The query is on a view and the problem is the choice of indexes in one of the tables in the view. Is there any way I can provide a hint in the query using the view to specify the index to use in the main query rather than within the view as there may be other queries that use the view when that hint would be unhelpful.

    What I have is basically a list of cash transactions with an account_id and a ledger_date and there is an index on both of these. There is then a table of account groups which maps each group to a range of accounts. There are usually 3-10 accounts per group.

    The query is looking for transactions for a single account group over a range of dates (the range can be quite large) and sometimes for no reason that I can fathom it seems to think it is a good idea to scan the range of dates and then filter by account rather than the other way around.

    So I want to do something like this...(but the hint is ignored presumably because this index doesn't actually exist on the view)

    select sum(amount)

    from vwCashTransactions with (index (idx_account))

    where groupaccountid = @P1 and ledgerdate between @P2 and @P3

    Now I could do something hideous to get round it such as this but I though it would be clearer to other people what I was doing if I could do it with hints or something else.

    select sum(amount)

    from vwCashTransactions

    where groupaccountid = @P1 and coalesce(ledgerdate,ledgerdate) between @P2 and @P3

    Any other bright ideas?

    Thanks

    Ken

  • spaghettidba

    SSC Guru

    Points: 105661

    I would never mess with index hints, let MSSQL choose the best plan for you.

    Try updating statistics: poor query plans are often due to outdated statistics or bad indexes.

  • Ken Gaul

    Ten Centuries

    Points: 1362

    Sorry should have mentioned , been there done that. It does kind of make a difference but only for a day at most and there isn't enough activity on the table to make day old statistics invalid. I find it odd that it takes a BETWEEN range condition over an equality one. It does seem to get much worse when we have some light days and the number of rows at the top end of the query date range are low, like it guesses that all days in the range have this small number of rows. I really don't want to have to go down the road of updating statistics every day on a table where the row distribution doesn't change that much :crazy:

  • Grant Fritchey

    SSC Guru

    Points: 395394

    You might not want to go down that road, and I don't blame you, but you might have to. Are you doing full scans on the statistics or just using the default sampling? A full scan might make a difference.

    Just so you know, we have a couple of tables that we update statistics on several times a day. Partly this is because of bad design that we're not allowed to "fix" because it's a financial system and they'd would have to go through a ninety day test cycle. Partly it's because the data is just volatile enough and just odd enough, that it needs VERY good statistics. Most of the rest of our databases run just fine with statistics updated weekly.

    Also, have you checked the index fragmentation levels?

    ----------------------------------------------------
    The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood...
    Theodore Roosevelt

    The Scary DBA
    Author of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd Edition
    Product Evangelist for Red Gate Software

  • Ken Gaul

    Ten Centuries

    Points: 1362

    OK thanks, that surely can't be right though! We have a bit of bad design here and there too but this particular section looks OK and it isn't an unreasonable query. I'm not even sure it will help in all cases either. I have seen it happen where the end date in the range is today and if we have no transactions for today it does it by date even though the start date of the range is a month or more ago. No matter how many times I recalc the stats it won't make any difference until we actually have some transactions!

    The whole thing just seems broken and I don't understand why I have to put so much work into something so simple!

    Maybe I'll go with the coalesce hack for now after all :unsure:

    Oh and I defragment every week too if it is more than 25% fragmented. Fragmentation doesn't make any difference to query plan choice though does it?

  • Grant Fritchey

    SSC Guru

    Points: 395394

    You could post the execution plan & query to see if anyone can spot something. I'd do that in a seperate post though so you get more eyeballs.

    ----------------------------------------------------
    The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood...
    Theodore Roosevelt

    The Scary DBA
    Author of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd Edition
    Product Evangelist for Red Gate Software

  • spaghettidba

    SSC Guru

    Points: 105661

    Have you tried some kind of query refactoring to force the query optimizer to do what you expect?

    Sometimes the same query with a different syntax takes a different query plan...

    select sum(amount)

    from (

    select amount, ledgerdate

    from vwCashTransactions

    where groupaccountid = @P1

    ) as a

    where ledgerdate between @P2 and @P3

  • Ken Gaul

    Ten Centuries

    Points: 1362

    Actually no, but now that I've been forced to think about it in a logical way to explain it here, I might have been making too many assumption about what it is doing. I can rarely if ever replicate the problem and the only reason I know what it is doing is by pulling the query plan out of the DB. So actually the problem is likely that the query plan has aged out and the first person to call this statement does it with a date range in the future or just for one day with no transactions, this is then cached for the rest of the day unless I recalc the stats and a new plan is generated till tomorrow when the same thing happens!

    So the question now boils down to is there a way to invalidate the query plan for one statement? (I can't go the recompile option because it isn't supported in our data access layer, don't ask!)

    Thanks for listening, maybe I should start learning to talk to myself 😀

    Ken.

  • Grant Fritchey

    SSC Guru

    Points: 395394

    I talk to myself all the time, to the horror of all my co-workers.

    OK. You could try a query hint in this case, the OPTIMIZE FOR hint might help out. Experiment with it. In 2008 you can also OPTIMIZE FOR UNKOWN, which can work really well.

    ----------------------------------------------------
    The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood...
    Theodore Roosevelt

    The Scary DBA
    Author of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd Edition
    Product Evangelist for Red Gate Software

  • SwePeso

    SSC-Dedicated

    Points: 39693

    It would help if you posted the view definition.

    Maybe something is seriously wrong there?

    SELECT SUM(Amount)

    FROM vwCashTransactions

    WHERE GroupAccountID = @P1

    AND LedgerDate BETWEEN @P2 AND @P3


    N 56°04'39.16"
    E 12°55'05.25"

  • Ken Gaul

    Ten Centuries

    Points: 1362

    OK here you go all the gory details, it all looks OK to me. This is view...

    ALTER VIEW [dbo].[vwConsolidatedCLT]

    AS

    SELECT CLT.TransID, CLT.LedgerDate, CON.ClAccountID, CLT.ClAccountID AS subclaccountid,

    CLT.LineNumber, CLT.MovementType, CLT.MovementSource, CLT.CCYCode, CLT.Amount,

    CLT.Balance, CLT.Reference, CLT.Narrative, CLT.DisplayToClient, CLT.Capital,

    CLT.ProductCapital, CLT.UserID, CLT.DateCreated, CLT.LedgerSource, CLT.LedgerSourceID,

    CLT.FromCCYCode, CLT.ToCCYCode, CLT.BaseCCYCode, CLT.FXRate, CLT.RestrictSweepUntil,

    CLT.AdjustmentUserID, PD.ProductType,COALESCE(PD.ProductDisplayName,PD.ProductType) as ProductDisplayName,

    CLT.ExcludeFromCalc

    FROM ClientAccount.dbo.CashLedgerTransactions as CLT

    INNER JOIN ClientAccount.dbo.Consolidate as CON

    ON CON.SubClAccountID = CLT.ClAccountID

    LEFT OUTER JOIN Discovery.dbo.ProductDetails as PD

    ON PD.ClAccountID = CON.SubClAccountID

    and this is the query...

    SELECT [TransID], [LedgerDate], [ClAccountID], [subclaccountid], [LineNumber], [MovementType], [MovementSource],

    [CCYCode], [Amount], [Balance], [Reference], [Narrative], [DisplayToClient], [Capital], [ProductCapital],

    [UserID], [DateCreated], [LedgerSource], [LedgerSourceID], [FromCCYCode], [ToCCYCode], [BaseCCYCode], [FXRate],

    [RestrictSweepUntil], [AdjustmentUserID], [ProductType], [ProductDisplayName], [ExcludeFromCalc]

    FROM ClientAccount..vwConsolidatedCLT

    WHERE [ClAccountID] = @0 AND [LedgerDate] > @1 AND [LedgerDate] <= @2 AND

    ([MovementType] Like @3 OR [MovementType] = @4) AND [ExcludeFromCalc] = @5

    I'm heading for the OPTIMIZE FOR hint for now.

    The density for each index is similar so the incorrect assumption is that the number of rows from consolidate returns more than the distinct ledgerdates in the range. From the stats on consolidate the largest number eq to any value is 96, but the majority are less than 24.

    The stats on the cashledger transaction table indicate an average of 30K-50K rows per ledgerdate and 2K-4K per account.

    So a worst case account query could get about (96*50K) 384K rows which would mean that any date range more than 12 days would be better by account, the average case would be (24*3K) 72K rows which would make a date range of more than 3 days less efficient. It is usual for the date range to be a month or two.

  • Gail Shaw

    SSC Guru

    Points: 1004424

    Ken Gaul (2/12/2009)


    OK thanks, that surely can't be right though!

    It does tend to happen with date columns, especially when the column in question is defaults to getdate and queries are looking for the current or previous day's data.

    I had one system where it too 2 weeks of inserts to trigger an auto-update of the stats, but 2 days for the stats to get stale enough to generate a really, really bad plan.

    Could you perhaps post the actual execution plan (saved as a .sqlplan file, zipped and attached)? There are some regulars who know their way around the plan and may be able to give you a better idea why the optimiser's making a mistake.

    Fragmentation can affect the plan choice. A badly fragmented index makes scans more expensive. I haven't investigated details though.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Ken Gaul

    Ten Centuries

    Points: 1362

    Yup can do, I've put the good and the bad plan in there for comparison.

  • Gail Shaw

    SSC Guru

    Points: 1004424

    Those are estimated plans. They don't have the information that I need. Please can you capture the actual plan and post that.

    There's no way to identify a stats issue from the estimated plan. It does look like that may be the issue, the estimated rows of 1 on the index seeks looks suspicious, but without the actual rows property (that's only present on the actual plan) I can't say for sure.

    Run the query with the execution plan option on and save the resulting plan.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Gail Shaw

    SSC Guru

    Points: 1004424

    Ken Gaul (2/13/2009)


    SELECT [TransID], [LedgerDate], [ClAccountID], [subclaccountid], [LineNumber], [MovementType], [MovementSource],

    [CCYCode], [Amount], [Balance], [Reference], [Narrative], [DisplayToClient], [Capital], [ProductCapital],

    [UserID], [DateCreated], [LedgerSource], [LedgerSourceID], [FromCCYCode], [ToCCYCode], [BaseCCYCode], [FXRate],

    [RestrictSweepUntil], [AdjustmentUserID], [ProductType], [ProductDisplayName], [ExcludeFromCalc]

    FROM ClientAccount..vwConsolidatedCLT

    WHERE [ClAccountID] = @0 AND [LedgerDate] > @1 AND [LedgerDate] <= @2 AND

    ([MovementType] Like @3 OR [MovementType] = @4) AND [ExcludeFromCalc] = @5

    Where are you calling that from? A stored procedure?

    What are @1, @2, ...? Variables? Procedure parameters? Constants?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 15 posts - 1 through 15 (of 35 total)

You must be logged in to reply to this topic. Login to reply