|
|
|
SSC-Enthusiastic
      
Group: General Forum Members
Last Login: Tuesday, May 14, 2013 1:08 PM
Points: 175,
Visits: 719
|
|
|
|
|
|
Forum Newbie
      
Group: General Forum Members
Last Login: Sunday, January 08, 2012 5:07 PM
Points: 4,
Visits: 25
|
|
Could you describe the overall process that invokes these queries and some approximate statistics on the volume and relative distributions of information being stored? Also, some details on the physical computing environment's capacity and performance of the storage and processing elements would be helpful. Similarly, what's the "budget" and timeframe for making improvements?
Regards, Ira
|
|
|
|
|
SSC Journeyman
      
Group: General Forum Members
Last Login: Friday, January 04, 2013 3:40 AM
Points: 87,
Visits: 248
|
|
CONTEST ENTRY
This took a bit longer than I thought...
Queries 1 and 2:
“Query #2 runs first, looking to see if that particular article has ever been viewed before or not.” If you simply want to know whether the article has “ever” been run before, why do you need to check the [Time] column? If there is a record in the table, wouldn’t that be enough to tell you if it has ever been viewed? If this is the case, I would remove the “[Time]=@RoundedTime” criteria.
If the [Time] criteria is necessary, I would apply the first listed nonclustered index recommendation:
CREATE NONCLUSTERED INDEX [IX_AnalysisLog_TimeData] ON [dbo].[AnalysisLog] ([Time],[Data])
You don’t show the structure of the AnalysisLog table. I assume that there is no identity column. If you were to add an identity column to the AnalysisLog table, eg. called AnalysisLogID, you could do this:
DECLARE @AnalysisLogID int
SELECT @AnalysisLogID=AnalysisLogID FROM dbo.AnalysisLog WHERE [Data] = @Data --AND [Time] = @RoundedTime **still required?** IF @AnalysisLogID IS NOT NULL UPDATE dbo.AnalysisLog SET [Count] = [Count] + 1 WHERE AnalysisLogID=@AnalysisLogID ELSE --execute undocumented query to create a new record
Query 3:
With this one, it appears that you are passing @SectionID and @PostName parameters, using them to lookup @ThreadID and @PostID, but only use the @ThreadID in the WHERE clause of the INSERT query. I don’t know if you need the @PostID at all, or if @ThreadID is sufficient. This should be more efficient: INSERT INTO #PageIndex (PostID) SELECT PostID FROM Posts WHERE IsApproved=1 AND PostLevel>1 AND ApplicationPostType<>16 AND SectionID=@SectionID AND PostName=@PostName ORDER BY PostDate
You also don’t show how you use the #PageIndex temp table after you create it. If you just run a simple SELECT statement afterwards, I would change the code to not use a temp table at all, especially as about 27% (31% of 91%) of the execution time is taken up by inserting into the temp table.
If you need to use the temp table, rather than create a clustered index, why not use: INSERT INTO #PageIndex(IndexID, PostID) SELECT ROW_NUMBER() OVER (ORDER BY PostID) as IndexID, PostID FROM Posts WHERE IsApproved=1 AND PostLevel>1 AND ApplicationPostType<>16 AND SectionID=@SectionID AND PostName=@PostName ORDER BY PostDate
And if you can make do without the temp table, simply drop the INSERT line.
Query 4:
I can see that there is a clustered index on both tables, but we don’t know which columns they are on. We can also see a large number of lob logical reads, but don’t know which columns are causing this as we don’t know the table definition. I can’t see anything obvious to improve on, but I would:
Consider adding an index on the SectionID column of the cs_weblog_Weblogs table. Consider adding a covering index on cs_weblog_Weblogs
It won’t improve efficiency, but I would revise the query to replace W.* with the actual column names and for consistency with the other queries, change the way that the tables are joined: FROM cs_Sections F INNER JOIN cs_weblog_Weblogs W ON F.SectionID=W.SectionID WHERE F.ApplicationType=1 AND F.SettingsID=@SettingsID
|
|
|
|
|
SSC-Enthusiastic
      
Group: General Forum Members
Last Login: Tuesday, May 14, 2013 1:08 PM
Points: 175,
Visits: 719
|
|
I am not able to answer all of your question, but here is some information that might be useful.
Each node of the cluster has identical hardware and software:
--Two, 2.26Hz Quad Core Intel CPUs (a total of eight cores, no hyperthreading) --24GB RAM --Two, locally-attached RAID 1 arrays (array 1 holds the system files and SQL Server binaries, and array 2 is used to make local backups, before they are moved off the server) --Two, RAID 10 SAN arrays (array 1 holds the MDF files, and array 2 holds the LDF files) --One, RAID 1 SAN array (used exclusively for tempdb) --Windows 2008 (64-bit) --SQL Server 2008 (64-bit)
In regard to the process that invokes the first two queries, essentially it is any page view on the SQLServerCentral.com website. I am not exactly sure what invokes the last two queries, but I will ask our web developer (who did not write this code) to see if he can figure it out.
In regards to server load, the webserver get about 120,000 to 150,000 page views during a typical business day, with most of them occuring during the day.
Here are some stats on the number of rows in some of the key tables:
AnalysisLog: 1,651,589 rows cs_posts: 24,671 rows cs_Sections: 97 rows cs_weblog_weblogs: 89 rows
In regard to a "budget" or timeframe for making improvements, I am not going to disclose this, other to say that we do have a full-time developer for the website (who has a lot of work already on his hands), and that adding to the existing hardware would be difficult (except RAM), as the equipment is under a lease. For the assumptions of the contest, assume the hardware will remain the same, although we might be able to get some developement time allocated.
Brad M. McGehee Microsoft SQL Server MVP Director of DBA Education, Red Gate Software www.bradmcgehee.com
|
|
|
|
|
SSC-Enthusiastic
      
Group: General Forum Members
Last Login: Tuesday, May 07, 2013 4:05 AM
Points: 199,
Visits: 498
|
|
Contest Entry
/*---------------------------------------------------------------- Query #1 & #2 ----------------------------------------------------------------*/
-- Assuming you have a unique ID for each article..... CREATE UNIQUE NONCLUSTERED INDEX idx_AnalysisLog_ArticleID ON dbo.AnalysisLog (ArticleID) INCLUDE ([Count]); GO
-- Wrap in a transaction to ensure multiple inserts don't fire a the same time if an article has never been viewed before BEGIN TRANSACTION;
-- Create or update based on the ArticleID, numbers make for faster searching. -- It doesn't mention that you want to know how many clicks per day, just how many clicks, assuming that.... DECLARE @ArticleID int, @RowCount int; SET @ArticleID = 7;
UPDATE dbo.AnalysisLog SET [Count] = COALESCE([Count], 0) + 1 -- Assumes the worst and Count allows NULL, should be defaulted to 0 and the COALSECE removed WHERE ArticleID = @ArticleID;
-- This way the index/table is only searched once for an insert or update whereas before an update required 2 searches SET @RowCount = @@ROWCOUNT;
IF @RowCount = 0 BEGIN INSERT INTO dbo.AnalysisLog (ArticleID, [Count], LastVisited) VALUES (@ArticleID, 1, CURRENT_TIMESTAMP); END
COMMIT TRANSACTION;
/*---------------------------------------------------------------- Query #3 ----------------------------------------------------------------*/
CREATE TABLE #PageIndex ( IndexID int IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED, PostID int );
-- Saves doing 2 searches, you may need to add a new index and re-order the columns in the where clause to put the most selective first -- Again, you would be better passing in a Blog ID of some sort and filter on this rather than filter on characters (PostName) -- It would be better if a temp table were never created, without seeing more information on how that table is used it's difficult to say INSERT INTO #PageIndex (PostID) SELECT PostID FROM cs_Posts WITH (NOLOCK) WHERE SectionID = @SectionID AND PostName = @PostName AND IsApproved = 1 AND PostLevel > 1 AND ApplicationPostType <> 16 ORDER BY PostDate;
/*---------------------------------------------------------------- Query #4 ----------------------------------------------------------------*/
-- Create this index to improve performance on this table, depending on the number of existing indexes and size of the table (it could be 3 rows for all I know!) CREATE NONCLUSTERED INDEX idx_cs_weblog_Weblogs_SectionID ON dbo.cs_weblog_Weblogs (SectionID); GO
DECLARE @SettingsID int; SET @SettingsID = 1000;
-- If this query is used to give information about blog authors it appears to have a vast amount of irrelevant data returned SELECT F.[SectionID], F.[SettingsID], F.[IsActive], F.[ParentID], F.[GroupID], F.[Name], F.[NewsgroupName], F.[Description], F.[DateCreated], F.[Url], F.[IsModerated], F.[DaysToView], F.[SortOrder], F.[TotalPosts], F.[TotalThreads], F.[DisplayMask], F.[EnablePostStatistics], F.[EnablePostPoints], F.[EnableAutoDelete], F.[EnableAnonymousPosting], F.[AutoDeleteThreshold], F.[MostRecentThreadID], F.[MostRecentThreadReplies], F.[PostsToModerate], F.[ForumType], F.[IsSearchable], F.[ApplicationType], F.[ApplicationKey], F.PropertyNames AS SectionPropertyNames, F.PropertyValues AS SectionPropertyValues, F.[DefaultLanguage], F.[DiskUsage], W.*, -- Should specify the exact columns required NULL AS LastUserActivity FROM cs_Sections F INNER JOIN cs_weblog_Weblogs W ON F.SectionID = W.SectionID -- Use JOINS! WHERE F.SettingsID = @SettingsID AND F.ApplicationType = 1;
My 2 cents!
|
|
|
|
|
SSC-Addicted
      
Group: General Forum Members
Last Login: Wednesday, May 08, 2013 7:34 AM
Points: 449,
Visits: 1,770
|
|
/* Contest Entry */
You said that this was a 2008 machine, so for Query 1 and 2 (and the 3rd non top 10 query to do the insert) you would be fine with a MERGE statement (killing three birds with one stone):
/* Query 1 & 2 */ DECLARE @roundedtime datetime = '2011-07-17 00:00:00.000', @data varchar(50) = 'article_loginclick' ; BEGIN TRAN MERGE dbo.AnalysisLog trg USING (SELECT @roundedtime AS [Time], @data AS [Data]) src ON src.Data = trg.DATA AND src.[Time] = trg.[Time] WHEN NOT MATCHED THEN INSERT ([Count],[Time],Data) VALUES (1,src.[Time], src.Data) WHEN MATCHED THEN UPDATE SET trg.[Count]=trg.[Count]+1 ; COMMIT TRAN I would also add the following index to assist the statement:
CREATE INDEX idx1 ON AnalysisLog ([Time],Data) INCLUDE (Count)
On Query 3 I see a you are not getting any data returned. If this is typical activity, I would want to know if there has maybe been a logic change elsewhere in the system that is possibly rendering this query obsolete (in it's current state). If not you could replace the statement with the following:
SELECT ROW_NUMBER() OVER (ORDER BY P.PostID) AS IndexID, P.PostID FROM cs_Posts P WHERE P.IsApproved = 1 AND P.PostLevel > 1 AND P.ApplicationPostType <> 16 AND P.SectionID = @SectionID AND P.PostName = @PostName ORDER BY P.PostDate You could take into consideration if the temp table is necessary at all, removing that if possible. It may also be possible to perform some index consolidation, changing the index that provides a seek on SecntionID and PostName to become a covering index for this query. Maybe something like:
CREATE INDEX idx ON cs_Posts (PostName,SectionId,IsApproved) INCLUDE (PostLevel, ApplicationPostType, PostId) This is going on the premise that the three key columns will be highly selective (if not unique) and the rest being added as includes to cover the query. Not knowing the table structure and meaning of the columns restricts this to only a possibility though.
For Query 4, it may be an idea to reduce the columns to an absolute necessity. Then I would look into the LOB reads - what is this actually achieving (if anything). The join syntax works, but is deprecated and should be changed to an INNER JOIN.
Regards,
WilliamD
|
|
|
|
|
Forum Newbie
      
Group: General Forum Members
Last Login: Sunday, January 08, 2012 5:07 PM
Points: 4,
Visits: 25
|
|
Thanks for the information Brad. I usually model and evaluate full configurations before suggesting what would be optimal. Two of the other replies are on similar tracks to what I was considering, so, they are ahead in terms of submitting an answer. The budget question was just about where you wanted to spend time/money in terms of people versus hardware. A complete redesign would hold things like the lookups for existence in a SSD, then increment counts or create and increment in a copy of the data in a RAID set based filegroup and asynchronously flush daily to another filegroup on one of the RAID arrays and rebuild the SSD based copy. Given your configuration, I would monitor for hot spots and also optimize using MERGE as suggested by another respondent. There's a few other possibilities including caching the top x% of hits (newly published article links, etc.) in the web front end that should get this configuration to average about 100 transactions per second. Without a load distribution, I presumed 30% of the time, peak load is occurring and the system is running at ~ 80% capacity.
Regards, Ira
|
|
|
|
|
SSC-Insane
         
Group: General Forum Members
Last Login: Today @ 3:06 PM
Points: 21,357,
Visits: 9,532
|
|
| If we don't want to "win" SM. Can we pick another tool?
|
|
|
|
|
SSC-Enthusiastic
      
Group: General Forum Members
Last Login: Tuesday, May 14, 2013 1:08 PM
Points: 175,
Visits: 719
|
|
Ninja's_RGR'us (8/2/2011) If we don't want to "win" SM. Can we pick another tool?
Don't know this answer, but I will ask.
Brad M. McGehee Microsoft SQL Server MVP Director of DBA Education, Red Gate Software www.bradmcgehee.com
|
|
|
|
|
SSC-Insane
         
Group: General Forum Members
Last Login: Today @ 3:06 PM
Points: 21,357,
Visits: 9,532
|
|
When do you plan on posting the DDL of the tables for the top 4 queries along with keys indexes and sample data. You already gave the rowcount, but did you do it for all tables?
The contest is pretty much useless without that. We can't post tested code bassed on your need so at this point this is just a guess!
|
|
|
|