Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 123»»»

SQLServerCentral.com Best Practices Clinic: Part 5 Expand / Collapse
Author
Message
Posted Monday, August 1, 2011 11:49 PM


SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Thursday, October 16, 2014 10:06 AM
Points: 176, Visits: 728
Comments posted to this topic are about the item SQLServerCentral.com Best Practices Clinic: Part 5

Brad M. McGehee
DBA
Post #1152296
Posted Tuesday, August 2, 2011 8:04 AM


Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Sunday, January 8, 2012 5:07 PM
Points: 4, Visits: 25
Could you describe the overall process that invokes these queries and some approximate statistics on the volume and relative distributions of information being stored? Also, some details on the physical computing environment's capacity and performance of the storage and processing elements would be helpful. Similarly, what's the "budget" and timeframe for making improvements?

Regards, Ira
Post #1152548
Posted Tuesday, August 2, 2011 8:39 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Thursday, October 10, 2013 8:54 AM
Points: 87, Visits: 253
CONTEST ENTRY

This took a bit longer than I thought...

Queries 1 and 2:

“Query #2 runs first, looking to see if that particular article has ever been viewed before or not.” If you simply want to know whether the article has “ever” been run before, why do you need to check the [Time] column? If there is a record in the table, wouldn’t that be enough to tell you if it has ever been viewed? If this is the case, I would remove the “[Time]=@RoundedTime” criteria.

If the [Time] criteria is necessary, I would apply the first listed nonclustered index recommendation:

CREATE NONCLUSTERED INDEX [IX_AnalysisLog_TimeData]
ON [dbo].[AnalysisLog] ([Time],[Data])

You don’t show the structure of the AnalysisLog table. I assume that there is no identity column. If you were to add an identity column to the AnalysisLog table, eg. called AnalysisLogID, you could do this:


DECLARE @AnalysisLogID int

SELECT @AnalysisLogID=AnalysisLogID
FROM dbo.AnalysisLog
WHERE [Data] = @Data
--AND [Time] = @RoundedTime **still required?**

IF @AnalysisLogID IS NOT NULL
UPDATE dbo.AnalysisLog
SET [Count] = [Count] + 1
WHERE AnalysisLogID=@AnalysisLogID
ELSE
--execute undocumented query to create a new record


Query 3:

With this one, it appears that you are passing @SectionID and @PostName parameters, using them to lookup @ThreadID and @PostID, but only use the @ThreadID in the WHERE clause of the INSERT query. I don’t know if you need the @PostID at all, or if @ThreadID is sufficient. This should be more efficient:

INSERT INTO #PageIndex (PostID)
SELECT PostID
FROM Posts
WHERE IsApproved=1 AND PostLevel>1 AND ApplicationPostType<>16
AND SectionID=@SectionID AND PostName=@PostName
ORDER BY PostDate


You also don’t show how you use the #PageIndex temp table after you create it. If you just run a simple SELECT statement afterwards, I would change the code to not use a temp table at all, especially as about 27% (31% of 91%) of the execution time is taken up by inserting into the temp table.

If you need to use the temp table, rather than create a clustered index, why not use:

INSERT INTO #PageIndex(IndexID, PostID)
SELECT ROW_NUMBER() OVER (ORDER BY PostID) as IndexID, PostID
FROM Posts
WHERE IsApproved=1 AND PostLevel>1 AND ApplicationPostType<>16
AND SectionID=@SectionID AND PostName=@PostName
ORDER BY PostDate

And if you can make do without the temp table, simply drop the INSERT line.


Query 4:

I can see that there is a clustered index on both tables, but we don’t know which columns they are on. We can also see a large number of lob logical reads, but don’t know which columns are causing this as we don’t know the table definition. I can’t see anything obvious to improve on, but I would:

Consider adding an index on the SectionID column of the cs_weblog_Weblogs table.
Consider adding a covering index on cs_weblog_Weblogs

It won’t improve efficiency, but I would revise the query to replace W.* with the actual column names and for consistency with the other queries, change the way that the tables are joined:

FROM cs_Sections F INNER JOIN cs_weblog_Weblogs W ON F.SectionID=W.SectionID
WHERE F.ApplicationType=1
AND F.SettingsID=@SettingsID
Post #1152590
Posted Tuesday, August 2, 2011 9:19 AM


SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Thursday, October 16, 2014 10:06 AM
Points: 176, Visits: 728
I am not able to answer all of your question, but here is some information that might be useful.

Each node of the cluster has identical hardware and software:

--Two, 2.26Hz Quad Core Intel CPUs (a total of eight cores, no hyperthreading)
--24GB RAM
--Two, locally-attached RAID 1 arrays (array 1 holds the system files and SQL Server binaries, and array 2 is used to make local backups, before they are moved off the server)
--Two, RAID 10 SAN arrays (array 1 holds the MDF files, and array 2 holds the LDF files)
--One, RAID 1 SAN array (used exclusively for tempdb)
--Windows 2008 (64-bit)
--SQL Server 2008 (64-bit)

In regard to the process that invokes the first two queries, essentially it is any page view on the SQLServerCentral.com website. I am not exactly sure what invokes the last two queries, but I will ask our web developer (who did not write this code) to see if he can figure it out.

In regards to server load, the webserver get about 120,000 to 150,000 page views during a typical business day, with most of them occuring during the day.

Here are some stats on the number of rows in some of the key tables:

AnalysisLog: 1,651,589 rows
cs_posts: 24,671 rows
cs_Sections: 97 rows
cs_weblog_weblogs: 89 rows

In regard to a "budget" or timeframe for making improvements, I am not going to disclose this, other to say that we do have a full-time developer for the website (who has a lot of work already on his hands), and that adding to the existing hardware would be difficult (except RAM), as the equipment is under a lease. For the assumptions of the contest, assume the hardware will remain the same, although we might be able to get some developement time allocated.


Brad M. McGehee
DBA
Post #1152626
Posted Tuesday, August 2, 2011 9:31 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Yesterday @ 11:11 AM
Points: 204, Visits: 656
Contest Entry

/*----------------------------------------------------------------
Query #1 & #2
----------------------------------------------------------------*/

-- Assuming you have a unique ID for each article.....
CREATE UNIQUE NONCLUSTERED INDEX idx_AnalysisLog_ArticleID ON dbo.AnalysisLog (ArticleID) INCLUDE ([Count]);
GO

-- Wrap in a transaction to ensure multiple inserts don't fire a the same time if an article has never been viewed before
BEGIN TRANSACTION;

-- Create or update based on the ArticleID, numbers make for faster searching.
-- It doesn't mention that you want to know how many clicks per day, just how many clicks, assuming that....
DECLARE @ArticleID int,
@RowCount int;
SET @ArticleID = 7;

UPDATE dbo.AnalysisLog
SET [Count] = COALESCE([Count], 0) + 1 -- Assumes the worst and Count allows NULL, should be defaulted to 0 and the COALSECE removed
WHERE ArticleID = @ArticleID;

-- This way the index/table is only searched once for an insert or update whereas before an update required 2 searches
SET @RowCount = @@ROWCOUNT;

IF @RowCount = 0
BEGIN
INSERT INTO dbo.AnalysisLog (ArticleID, [Count], LastVisited)
VALUES (@ArticleID, 1, CURRENT_TIMESTAMP);
END

COMMIT TRANSACTION;


/*----------------------------------------------------------------
Query #3
----------------------------------------------------------------*/

CREATE TABLE #PageIndex (
IndexID int IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
PostID int
);

-- Saves doing 2 searches, you may need to add a new index and re-order the columns in the where clause to put the most selective first
-- Again, you would be better passing in a Blog ID of some sort and filter on this rather than filter on characters (PostName)
-- It would be better if a temp table were never created, without seeing more information on how that table is used it's difficult to say
INSERT INTO #PageIndex (PostID)
SELECT PostID
FROM cs_Posts WITH (NOLOCK)
WHERE SectionID = @SectionID
AND PostName = @PostName
AND IsApproved = 1
AND PostLevel > 1
AND ApplicationPostType <> 16
ORDER BY PostDate;


/*----------------------------------------------------------------
Query #4
----------------------------------------------------------------*/

-- Create this index to improve performance on this table, depending on the number of existing indexes and size of the table (it could be 3 rows for all I know!)
CREATE NONCLUSTERED INDEX idx_cs_weblog_Weblogs_SectionID ON dbo.cs_weblog_Weblogs (SectionID);
GO

DECLARE @SettingsID int;
SET @SettingsID = 1000;

-- If this query is used to give information about blog authors it appears to have a vast amount of irrelevant data returned
SELECT F.[SectionID],
F.[SettingsID],
F.[IsActive],
F.[ParentID],
F.[GroupID],
F.[Name],
F.[NewsgroupName],
F.[Description],
F.[DateCreated],
F.[Url],
F.[IsModerated],
F.[DaysToView],
F.[SortOrder],
F.[TotalPosts],
F.[TotalThreads],
F.[DisplayMask],
F.[EnablePostStatistics],
F.[EnablePostPoints],
F.[EnableAutoDelete],
F.[EnableAnonymousPosting],
F.[AutoDeleteThreshold],
F.[MostRecentThreadID],
F.[MostRecentThreadReplies],
F.[PostsToModerate],
F.[ForumType],
F.[IsSearchable],
F.[ApplicationType],
F.[ApplicationKey],
F.PropertyNames AS SectionPropertyNames,
F.PropertyValues AS SectionPropertyValues,
F.[DefaultLanguage],
F.[DiskUsage],
W.*, -- Should specify the exact columns required
NULL AS LastUserActivity
FROM cs_Sections F
INNER JOIN cs_weblog_Weblogs W ON F.SectionID = W.SectionID -- Use JOINS!
WHERE F.SettingsID = @SettingsID
AND F.ApplicationType = 1;

My 2 cents!
Post #1152638
Posted Tuesday, August 2, 2011 10:49 AM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Friday, October 24, 2014 7:46 AM
Points: 449, Visits: 1,865
/* Contest Entry */

You said that this was a 2008 machine, so for Query 1 and 2 (and the 3rd non top 10 query to do the insert) you would be fine with a MERGE statement (killing three birds with one stone):

/* Query 1 & 2 */
DECLARE @roundedtime datetime = '2011-07-17 00:00:00.000',
@data varchar(50) = 'article_loginclick' ;

BEGIN TRAN
MERGE dbo.AnalysisLog trg
USING
(SELECT @roundedtime AS [Time],
@data AS [Data]) src
ON src.Data = trg.DATA
AND src.[Time] = trg.[Time]
WHEN NOT MATCHED
THEN INSERT ([Count],[Time],Data)
VALUES (1,src.[Time], src.Data)
WHEN MATCHED
THEN UPDATE SET trg.[Count]=trg.[Count]+1
;
COMMIT TRAN

I would also add the following index to assist the statement:

CREATE INDEX idx1 ON AnalysisLog ([Time],Data) INCLUDE (Count)


On Query 3 I see a you are not getting any data returned. If this is typical activity, I would want to know if there has maybe been a logic change elsewhere in the system that is possibly rendering this query obsolete (in it's current state). If not you could replace the statement with the following:

SELECT  ROW_NUMBER() OVER (ORDER BY P.PostID) AS IndexID,
P.PostID
FROM cs_Posts P
WHERE P.IsApproved = 1
AND P.PostLevel > 1
AND P.ApplicationPostType <> 16
AND P.SectionID = @SectionID
AND P.PostName = @PostName
ORDER BY P.PostDate

You could take into consideration if the temp table is necessary at all, removing that if possible. It may also be possible to perform some index consolidation, changing the index that provides a seek on SecntionID and PostName to become a covering index for this query. Maybe something like:

CREATE INDEX idx ON cs_Posts (PostName,SectionId,IsApproved) INCLUDE (PostLevel, ApplicationPostType, PostId)

This is going on the premise that the three key columns will be highly selective (if not unique) and the rest being added as includes to cover the query. Not knowing the table structure and meaning of the columns restricts this to only a possibility though.

For Query 4, it may be an idea to reduce the columns to an absolute necessity. Then I would look into the LOB reads - what is this actually achieving (if anything). The join syntax works, but is deprecated and should be changed to an INNER JOIN.



Regards,

WilliamD
Post #1152736
Posted Tuesday, August 2, 2011 11:31 AM


Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Sunday, January 8, 2012 5:07 PM
Points: 4, Visits: 25
Thanks for the information Brad. I usually model and evaluate full configurations before suggesting what would be optimal. Two of the other replies are on similar tracks to what I was considering, so, they are ahead in terms of submitting an answer. The budget question was just about where you wanted to spend time/money in terms of people versus hardware. A complete redesign would hold things like the lookups for existence in a SSD, then increment counts or create and increment in a copy of the data in a RAID set based filegroup and asynchronously flush daily to another filegroup on one of the RAID arrays and rebuild the SSD based copy. Given your configuration, I would monitor for hot spots and also optimize using MERGE as suggested by another respondent. There's a few other possibilities including caching the top x% of hits (newly published article links, etc.) in the web front end that should get this configuration to average about 100 transactions per second. Without a load distribution, I presumed 30% of the time, peak load is occurring and the system is running at ~ 80% capacity.

Regards, Ira
Post #1152801
Posted Tuesday, August 2, 2011 11:51 AM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Wednesday, October 8, 2014 4:15 AM
Points: 20,581, Visits: 9,619
If we don't want to "win" SM. Can we pick another tool?
Post #1152817
Posted Tuesday, August 2, 2011 11:58 AM


SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Thursday, October 16, 2014 10:06 AM
Points: 176, Visits: 728
Ninja's_RGR'us (8/2/2011)
If we don't want to "win" SM. Can we pick another tool?


Don't know this answer, but I will ask.


Brad M. McGehee
DBA
Post #1152836
Posted Tuesday, August 2, 2011 12:04 PM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Wednesday, October 8, 2014 4:15 AM
Points: 20,581, Visits: 9,619
When do you plan on posting the DDL of the tables for the top 4 queries along with keys indexes and sample data. You already gave the rowcount, but did you do it for all tables?


The contest is pretty much useless without that. We can't post tested code bassed on your need so at this point this is just a guess!
Post #1152845
« Prev Topic | Next Topic »

Add to briefcase 123»»»

Permissions Expand / Collapse