Help us tune a query

  • Hi all,

    As Steve suggested, we're going to try and involve the community in some of what we're doing to make SQLServerCentral run faster and smoother. So here's the first wave!

    The main database server currently suffers from hi CPU load. It is not generally bottlenecked on either memory, or I/O. Looking at the performance data, the two queries that cause the highest total CPU time on the server are both related to creating the displays of the number of points that have been scored (right hand column on the homepage, or on the full scores page). Originally (a couple of weeks ago) that query was as follows:

    WITH topScores AS

    (SELECT [UserPoints].UserID

    SUM(CASE WHEN [Date] >= @cutoffDate THEN [PointsScored] ELSE 0 END) AS

    SUM([PointsScored]) AS AllPoints,

    ROW_NUMBER() OVER(ORDER BY SUM(CASE WHEN [Date] >= @cutoffDate THEN [PointsScored] ELSE 0 END) DESC,

    SUM([PointsScored]) DESC) as RowNumber

    FROM [UserPoints]

    WHERE PointsCategory = ISNULL(@pointsCategory, PointsCategory)

    AND UserID <> 57832 -- Exclude 'Site Owners', but do it explicitly by UserID to optimise query

    GROUP BY [UserPoints].UserID)

    SELECT RowNumber, u.UserID, u.DisplayName, topScores.RecentPoints, topScores.AllPoints

    FROM topScores

    JOIN Users u ON u.[UserID] = topScores.UserID

    WHERE RowNumber <= @maxResults or (u.UserID = @additionalUserID)
    ORDER BY RowNumber
    [/code]

    And that version was performing staggeringly badly. Anyway, that query was optimised, and rebuilt using a table valued variable rather than the CTE, and now looks more like this:

    [code="sql"]
    DECLARE @TopScores TABLE
    (
    UserID INT,
    RecentPoints INT,
    AllPoints INT,
    RowNumber INT
    )

    INSERT INTO @TopScores (
    UserID,
    RecentPoints,
    AllPoints,
    RowNumber
    )
    SELECT [UserPoints].UserID,
    SUM(CASE WHEN [Date] >= @CutoffDate THEN [PointsScored] ELSE 0 END) AS RecentPoints,

    SUM([PointsScored]) AS AllPoints,

    ROW_NUMBER() OVER(ORDER BY SUM(CASE WHEN [Date] >= @CutoffDate THEN [PointsScored] ELSE 0 END) DESC,

    SUM([PointsScored]) DESC) as RowNumber

    FROM [UserPoints]

    WHERE PointsCategory = ISNULL(@PointsCategory, PointsCategory)

    AND UserID <> 57832 -- Exclude 'Site Owners', but do it explicitly by UserID to optimise query

    GROUP BY [UserPoints].UserID

    SELECT t.RowNumber, u.UserID, u.DisplayName, t.RecentPoints, t.AllPoints

    FROM @TopScores t

    JOIN Users u ON u.[UserID] = t.UserID

    WHERE t.RowNumber <= @MaxResults OR @AdditionalUser = u.UserID
    ORDER BY t.RowNumber
    [/code]

    The actual performance of the query is much better now, however it is still run a lot of times. Partly as a result of the fact that we include the user's score in the query, it is run almost every time a new person hits the homepage. There is some output caching of the control that creates the actual display, but the INSERT and SELECT are still the two biggest total users of CPU time.

    Our next thought is to cache the results. So we will run a SQL job every (say) 15 minutes to update a table containing the current state of the points on the site. At the very least, we will therefore have removed nearly half the load by avoiding having to do the insert as frequently. Hopefully we can also lower the load still further by good design of the table in which the cached results are kept so that it is appropriately indexed. This is the bit that I'm hoping that some more experienced DBAs will be able to help with! The starting point for the design of the table is:

    [code="sql"]
    CREATE TABLE [dbo].[TopScoresCache](
    [TopScoreCacheID] [int] NOT NULL,
    [UserID] [int] NOT NULL,
    [DisplayName] [nvarchar](100) NOT NULL,
    [PointsCategory] [int] NOT NULL,
    [RowNumber] [int] NOT NULL,
    [AllPoints] [int] NOT NULL,
    [RecentPoints] [int] NOT NULL,
    CONSTRAINT [PK_TopScoresCache] PRIMARY KEY CLUSTERED
    (
    [TopScoreCacheID] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    ) ON [PRIMARY]
    [/code]

    The table will contain effectively three sets of top scores, split by PointsCategory (you can see this in action by going to the main points page, and seeing the 'total', 'forum' and 'QoD' points). I haven't put any indices apart from the primary key on at the moment, so anything's fair game.

    Alternatively, if you think we're barking up completely the wrong tree, shout as well. Hopefully this can spark a useful discussion of some real world tuning scenarios. In addition, if there's further information you'd need in order to help formulate your ideas, let me know and I'll see what I can do.

    Mel

    Mel HarbourProject ManagerRed Gate Software

  • It looks like the whole thing hinges on the UserPoints table.

    Can you provide the DML for creating a copy of that table, and a sample insert statement? If you provide a dozen or so rows, we can fill the table up with millions of rows of sample data based on that.

    That'll give us a much better chance of really helping.

    Edit: Would also be good to know what edition and version of SQL this is running on. Since one solution uses a CTE, it's obviously either 2005 or 2008, but I can't tell if it's Standard, Enterprise, etc., and it would be good to know specifically on the 2005/8 point. I would assume 2005 Enterprise, but I hate assuming.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Sure thing!

    CREATE TABLE [dbo].[UserPoints](

    [UserPointsRecordID] [int] IDENTITY(1,1) NOT NULL,

    [UserID] [int] NOT NULL,

    [Date] [smalldatetime] NOT NULL,

    [PointsScored] [int] NOT NULL,

    [PointsCategory] [int] NOT NULL,

    CONSTRAINT [PK_UserPoints] PRIMARY KEY CLUSTERED

    (

    [UserPointsRecordID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 99) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    ALTER TABLE [dbo].[UserPoints] WITH CHECK ADD CONSTRAINT [FK_UserPoints_Users] FOREIGN KEY([UserID])

    REFERENCES [dbo].[Users] ([UserID])

    ON DELETE CASCADE

    GO

    ALTER TABLE [dbo].[UserPoints] CHECK CONSTRAINT [FK_UserPoints_Users]

    As far as data goes, the date is used as a way of 'rolling up' previous points to a certain point. PointsCategory is either 1 or 2 for either QoD or Forums. So the three summed values of points are for 1 and 2 individually, and for all points. That should be enough to generate yourselves some test data.

    Mel HarbourProject ManagerRed Gate Software

  • Oh, and it's 2005 Standard.

    We've put the wheels in motion to get some new hardware, and when we do that we'll almost certainly move to 2008 at the same time.

    Mel

    Mel HarbourProject ManagerRed Gate Software

  • It's just been pointed out to me that I missed a 'wide open' opportunity to plug SQL Data Generator! 😀

    Mel HarbourProject ManagerRed Gate Software

  • Hi Mel

    Could you please post index definitions and execution plan, as per http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

    😉

    What's the reasoning behind this?

    WHERE PointsCategory = ISNULL(@PointsCategory, PointsCategory)

    That kind of construct tends to play havok with indexing.

    What values of @PointsCategory would this be called with and what would a null value in that parameter imply?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Not sure I'm clear on the Date column.

    If I get 1 point today, and 1 point tomorrow, and have no previous points, would I have an entry for today that shows 1 point, and and entry tomorrow that has 2 in the PointsScored column? Or would the entry for tomorrow have 1 in that column?

    Your comment about it being used to roll up the numbers made me unsure on that.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • My First thoughts are to create a nightly , 'Year to date' table of the summarised data.

    Obviously 99% of the users have no updates throughout the day so its pretty static.

    Then have a trigger for insert on the UserPoints to increment the summary table with the new value.



    Clear Sky SQL
    My Blog[/url]

  • Dave Ballantyne (7/13/2009)


    My First thoughts are to create a nightly , 'Year to date' table of the summarised data.

    Obviously 99% of the users have no updates throughout the day so its pretty static.

    Then have a trigger for insert on the UserPoints to increment the summary table with the new value.

    I'd agree on the first, but maybe rather than a trigger a nightly job to update the summary table. Points are as-of yesterday on the main screen, but that shouldn't be an issue and it removes the potential performance impact of a trigger.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • UserPoints:

    CREATE TABLE [dbo].[UserPoints]

    (

    [UserPointsRecordID] [int] NOT NULL IDENTITY(1, 1),

    [UserID] [int] NOT NULL,

    [Date] [smalldatetime] NOT NULL,

    [PointsScored] [int] NOT NULL,

    [PointsCategory] [int] NOT NULL

    )

    GO

    ALTER TABLE [dbo].[UserPoints] ADD CONSTRAINT [PK_UserPoints] PRIMARY KEY CLUSTERED ([UserPointsRecordID])

    GO

    CREATE NONCLUSTERED INDEX [ssp_1212] ON [dbo].[UserPoints] ([Date]) INCLUDE ([UserPointsRecordID])

    GO

    CREATE NONCLUSTERED INDEX [ssc_UserID] ON [dbo].[UserPoints] ([UserID], [PointsCategory], [Date]) INCLUDE ([PointsScored])

    GO

    ALTER TABLE [dbo].[UserPoints] ADD CONSTRAINT [FK_UserPoints_Users] FOREIGN KEY ([UserID]) REFERENCES [dbo].[Users] ([UserID]) ON DELETE CASCADE

    GO

    The UserPoints table currently has about 185,000 rows. Execution plans are attached.

    Mel HarbourProject ManagerRed Gate Software

  • Ouch. 100000 rows in a table variable, a clustered index scan on the users table and a seek that's returning most of the table on the points... Not fun.

    Dunno if you missed my question from earlier.

    What's the reasoning behind this?

    WHERE PointsCategory = ISNULL(@PointsCategory, PointsCategory)

    That kind of construct tends to play havok with indexing.

    What values of @PointsCategory would this be called with and what would a null value in that parameter imply?

    Are these queries within a stored procedure? How much work would it be to change the web page to call one of two different procedurs, depending on the value of @PointsCategory, instead of calling one proc?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Ok, perhaps some further explanation of how the table works!

    I've attached a sample of the data in it, namely the data for me. As you'll see, I've answered a few questions over the last couple of days (some of it while trying to get the site working properly again!). You'll also see, nestled in the middle a row that represents all the points I've scored that predate any of the other rows 'rolled up' into one record.

    The procedure that does this part is as follows. I've left in the original developer comments, which should help to explain what it's doing.

    CREATE PROCEDURE [dbo].[ArchiveOldUserPoints]

    @ArchiveThresholdDays int = 40

    AS

    BEGIN

    -- To prevent the UserPoints table getting too large, we only retain daily

    -- totals for the last 40 days. Older points are stored in a single (per-user-category-pair)

    -- archive record dated 2001-01-01. More specifically:

    --

    -- Definition: "Archivable" user points records are ones that are both

    --- older than @ArchiveThresholdDays days

    --- not dated '2001-01-01'

    --

    -- This proc takes all archivable user points records for each user,

    -- * adds their total points value to their archive record (dated 2001-01-01) for that category

    -- * then deletes those archivable records

    --

    -- What's left are the recent entries, and the one-per-user-category archive record

    SET NOCOUNT ON;

    DECLARE @ArchiveMarkerDate DATETIME

    SET @ArchiveMarkerDate = '2001-01-01'

    -- Fix the date in case we pass midnight during processing...

    DECLARE @ArchiveBefore DATETIME

    SELECT @ArchiveBefore = DATEADD(DAY, -1*@ArchiveThresholdDays, GETDATE())

    BEGIN TRAN

    -- Make sure there's at least a blank (zero) archive record for everyone who's scored points recently in each category

    INSERT INTO UserPoints (UserID, Date, PointsScored, PointsCategory)

    SELECT up.UserID, @ArchiveMarkerDate, 0, up.PointsCategory

    FROM UserPoints up

    LEFT OUTER JOIN UserPoints up_archive ON [up_archive].date = @ArchiveMarkerDate AND [up_archive].UserID=up.UserID AND [up_archive].[PointsCategory]=up.[PointsCategory]

    WHERE up.Date < @ArchiveBefore

    AND up.Date @ArchiveMarkerDate

    AND [up_archive].PointsScored IS null

    GROUP BY up.UserID, [up_archive].PointsScored, up.PointsCategory

    -- Add the archivable points to the archive records

    UPDATE up_archive

    SET up_archive.PointsScored = up_archive.PointsScored + aggregates.TotalPointsDueForArchive

    FROM UserPoints up_archive

    JOIN (SELECT up.UserID,

    up.PointsCategory,

    SUM(up.PointsScored) AS TotalPointsDueForArchive

    FROM UserPoints up

    WHERE up.Date < @ArchiveBefore

    AND up.Date @ArchiveMarkerDate

    GROUP BY up.UserID, up.PointsCategory) aggregates ON [aggregates].UserID = up_archive.UserID and [aggregates].PointsCategory = up_archive.PointsCategory

    WHERE up_archive.Date = @ArchiveMarkerDate

    -- Delete the archivable records

    DELETE FROM UserPoints

    WHERE Date < @ArchiveBefore

    AND Date @ArchiveMarkerDate

    COMMIT TRAN

    END

    Mel HarbourProject ManagerRed Gate Software

  • Gail,

    The PointsCategory null bit has three possible values that it's ever called with:

    NULL = Get me all the points (used on the homepage, and top left on the main scores page)

    1 = Forums

    2 = QoD

    It would be relatively straightforward to split it into running two completely separate SPs for the two cases.

    Mel HarbourProject ManagerRed Gate Software

  • The CSV doesn't have column names, but it looks like they should be ID, UserID, Date, Points, Category. Is that correct?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Mel Harbour (7/13/2009)


    It would be relatively straightforward to split it into running two completely separate SPs for the two cases.

    As a first step, before getting into indexing, I would suggest to do just that, and also replace that table variable with a temp table, see what, if any, difference that makes.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 15 posts - 1 through 15 (of 103 total)

You must be logged in to reply to this topic. Login to reply