Help us tune a query

  • Yes, sorry, done really quickly!

    Mel HarbourProject ManagerRed Gate Software

  • As a first try at indexes, and without seeing any of the other queries against this table..

    Add an index:

    PointsCategory, UserID INCLUDE Date, PointsScored

    I know there's a similar existing index, but I suspect this will be more beneficial with the equality match as the leading column. If you trust that there are no other queries against this table, then the existing index ssc_UserID can have PointsCategory removed and the date move to the include. If there may be other queries, leave that index as is.

    Having one with the leading column as PointsCategory and a second with leading column of UserID means there's an optimal index for each of the split-up queries.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Ok, I've run the query, simply using a temporary table rather than a table variable. I haven't put any indexes on the temporary table. The execution plans are attached.

    Mel HarbourProject ManagerRed Gate Software

  • That looks a little better already.

    Try adding an index on the temp table. Put this on before you insert the data into the temp table.

    CREATE CLUSTERED INDEX idx_TopScores_RowNumberUserID ON #TopScores (RowNumber, UserID)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • While I can try splitting the query and look at the execution plans, making the code changes will require that that part of it passes through our test and release cycles. That doesn't take that long, but it's obviously not as quick as just being able to try some things out!

    Mel HarbourProject ManagerRed Gate Software

  • And here's the plans for the temporary table with the index added.

    Out of interest, Gail, when you say that it looks better already, what part are you referring to? When I'm looking at the subtree cost of the two queries, the select looks to be massively more costly than it was when it was running with the table variable.

    Mel HarbourProject ManagerRed Gate Software

  • Mel Harbour (7/13/2009)


    Out of interest, Gail, when you say that it looks better already, what part are you referring to? When I'm looking at the subtree cost of the two queries, the select looks to be massively more costly than it was when it was running with the table variable.

    Costs are based on the row count estimate. When dealing with a table variable, the optimiser always estimates one row, hence the costs are completely untrustworthy, especially when dealing with 100000 rows in the table variable.

    When I said 'looks better' I was basing that one what I know of query operatory. When joining two resultsets both with thousands of rows, a nested loop join is terrible, because it will read one table (in your case the users table) once for each row of the other resultset. In this case, over 100 000 times. When the join type changed to hash join, after converting to a temp table. each table is read only once. You'll see the improvement in the query time and in the IO statistics.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • As far as a revamped solution goes, here's what I've come up with so far.

    create table dbo.Users (

    UserID int primary key,

    DisplayName varchar(100));

    go

    insert into dbo.Users (UserID)

    select Number

    from dbo.Numbers

    where Number <= 1000;

    go

    CREATE TABLE [dbo].[UserPoints](

    [UserPointsRecordID] [int] IDENTITY(1,1) primary key,

    [UserID] [int] NOT NULL references dbo.Users(UserID),

    [Date] [smalldatetime] NOT NULL,

    [PointsScored] [int] NOT NULL,

    [PointsCategory] [int] NOT NULL);

    go

    ;with

    PCats (PCat) as

    (select 1

    union all select 2),

    Dates (Date) as

    (select dateadd(day, Number, '1/1/2006')

    from dbo.Numbers

    where Number <= datediff(day, '1/1/2006', getdate()))

    insert into dbo.UserPoints (UserID, Date, PointsScored, PointsCategory)

    select UserID, Date, abs(checksum(newid()))%10, PCat

    from dbo.Users

    cross join Dates

    cross join PCats;

    go

    CREATE TABLE [dbo].[UserPointsArchiveOld](

    [UserPointsRecordID] [int] IDENTITY(1,1) primary key,

    [UserID] [int] NOT NULL references dbo.Users(UserID),

    [PointsScored] [int] NOT NULL,

    [PointsCategory] [int] NOT NULL);

    go

    CREATE TABLE [dbo].[UserPointsArchive30](

    [UserPointsRecordID] [int] IDENTITY(1,1) primary key,

    [UserID] [int] NOT NULL references dbo.Users(UserID),

    [Date] [smalldatetime] NOT NULL,

    [PointsScored] [int] NOT NULL,

    [PointsCategory] [int] NOT NULL);

    go

    CREATE TABLE [dbo].[UserPointsToday](

    [UserPointsRecordID] [int] IDENTITY(1,1) primary key,

    [UserID] [int] NOT NULL references dbo.Users(UserID),

    [Date] [smalldatetime] NOT NULL,

    [PointsScored] [int] NOT NULL,

    [PointsCategory] [int] NOT NULL);

    go

    insert into dbo.UserPointsArchiveOld (UserID, PointsScored, PointsCategory)

    select UserID, sum(PointsScored), PointsCategory

    from dbo.UserPoints

    where Date = getdate()-30

    and Date = dateadd(day, datediff(day, 0, getdate()), 0);

    go

    create view dbo.UserPointsCurrent

    as

    select UserID, sum(PointsScored) as PointsScored, PointsCategory

    from dbo.UserPointsToday

    group by UserID, PointsCategory;

    go

    create view dbo.UserPointsLast30

    as

    select UserID, sum(PointsScored) as PointsScored, PointsCategory

    from

    (select UserID, PointsScored, PointsCategory

    from dbo.UserPointsCurrent

    union all

    select UserID, sum(PointsScored), PointsCategory

    from dbo.UserPointsArchive30

    group by UserID, PointsCategory) Last30

    group by UserID, PointsCategory;

    go

    create view dbo.UserPointsTotal

    as

    select UserID, sum(PointsScored) as PointsScored, PointsCategory

    from

    (select UserID, PointsScored, PointsCategory

    from dbo.UserPointsLast30

    union all

    select UserID, PointsScored, PointsCategory

    from dbo.UserPointsArchiveOld) AllPoints

    group by UserID, PointsCategory;

    go

    create index IDX_UserPointsArchive30_Main on dbo.UserPointsArchive30

    (UserID, PointsCategory, Date)

    include

    (PointsScored);

    go

    create index IDX_UserPointsToday_Main on dbo.UserPointsToday

    (UserID, PointsCategory, Date)

    include

    (PointsScored);

    go

    declare @user-id int;

    select @user-id = 1;

    ;with

    CTE1 as

    (select top 5 UserID, sum(PointsScored) as Points

    from dbo.UserPointsLast30

    group by UserID

    order by sum(PointsScored) desc),

    CTE2 as

    (select @user-id as UserID, sum(PointsScored) as Points

    from dbo.UserPointsLast30

    where UserID = @user-id)

    select *

    from CTE1

    union

    select *

    from CTE2

    order by Points desc;

    The last bit is a sample query that would get you most of what you need for the home page. The other queries would be similar.

    You'd archive the data from the Archive30 table into the ArchiveOld table with a simple query run each night. Same for archiving from the Today table into Archive30.

    I'd run the Today table as an inventory-style solution, where you insert points transactions instead of updating. I.e., if someone earns 5 points for answering a QotD entry, you'd insert a row for that, as opposed to updating their entry. That way, you don't have to run an upsert, you just run an insert, which will be faster and simpler.

    I tested this structure with some basic queries, like the top 25 all-time scores, the top 25 all-time forum scores, the top 5 total scores for the last 30 days. Major improvement in query time over the current solution.

    For example, getting the top 5 + current user (or just top 5 if current user is in there), using your query, took 822 milliseconds on my desktop machine, but took 28 milliseconds using the final query in my code above. All of that on the sample data I produced as per the scripts I posted.

    That might be a bit more extensive than what you're looking for right now, but do take a look at it and see if it's a valid long-term solution.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Ok, index didn't help much, and it's because of the OR.

    Can you try changing that index and adding a second

    CREATE CLUSTERED INDEX idx_TopScores_RowNumber ON #TopScores (RowNumber)

    CREATE INDEX idx_TopScores_UserID ON #TopScores (UserID)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Mel Harbour (7/13/2009)


    And here's the plans for the temporary table with the index added.

    Out of interest, Gail, when you say that it looks better already, what part are you referring to? When I'm looking at the subtree cost of the two queries, the select looks to be massively more costly than it was when it was running with the table variable.

    Table variables are always estimated as if they were going to only have 1 row in them. That will give you a very false low on the cost, and can end up with all kinds of other problems in the execution plan.

    They're generally okay if you can be certain they will not ever exceed about 100 rows. That's a rule-of-thumb and kind of arbitrary, but it generally works.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • On the idea of multiple queries to handle the OR problems, you don't actually have to modify the front end. Have the main proc call sub-procs based on the parameter values. You can optimize the sub-procs, and eliminate scans caused by the IsNull and OR issues.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • More plans, with the different indexes suggested by Gail.

    @GSquared, yes, indeed we can!

    Mel HarbourProject ManagerRed Gate Software

  • Once we've got a version of the SP that there's general agreement for, I'm happy to put it live so that we can pick up some real data on how it performs over time.

    Mel HarbourProject ManagerRed Gate Software

  • Ok, that OR really doesn't like me. Can you change the query as follows?

    SELECT t.RowNumber, u.UserID, u.DisplayName, t.RecentPoints, t.AllPoints

    FROM #TopScores t

    JOIN Users u ON u.[UserID] = t.UserID

    WHERE t.RowNumber <= 5 OR t.UserID = 769521

    ORDER BY t.RowNumber

    Just changed what table the user filter is done on. It shouldn't make a difference, however with the OR it might.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Well, that got a reaction out of it!

    Mel HarbourProject ManagerRed Gate Software

Viewing 15 posts - 16 through 30 (of 103 total)

You must be logged in to reply to this topic. Login to reply