December 10, 2012 at 12:04 pm
Hello,
I need tuning advices for large tables query performance.
I have 3 big tables and I need to join these 3 tables on some queries and it's getting slower every day because of the table rows size increasing.
I believe I made the index tuning.
I'd be glad if you suggest me for better performance tuning.
It is not easy to split tables. I tried to create another data files but it didn't make a change.
Each table 5-6 GB size by the way.
Thanks in advance.
December 10, 2012 at 12:26 pm
Performance tuning is generally done for a specific performance goal, and without some detail on what you already have in the way of tables, indexes, row sizes, row counts, and the query (ies?) that need help, general advice might not be terribly useful. There are a number of possible techniques to help out, such as Partitioning, Sparse Indexes (typically for fields with large numbers of rows with NULL values. or for situations where just 1 value ou all possible values for a field is of interest), Index query hints, query optimization, temp tables, table variables, etc...
You haven't provided enough information to go into any detail, however... Care to elaborate?
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
December 10, 2012 at 12:49 pm
When working with large tables:
Retrieve just the data that you absolutely need.
Understand that no one is going to read several million rows, hence do not query several million rows and return that to the client. It is a waste
Make sure your indexes support the queries
Make sure the queries are written to be able to use indexes
Partitioning is not primarily a performance tuning mechanism, it's mostly about maintainability.
Avoid index hints unless you are absolutely, completely sure that you know better than the optimiser what the best plan for a query is
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
December 10, 2012 at 12:58 pm
Hello Steve,
You're right. Let me give detailed information.
My website is http://www.websiteanalyzer.info
It analyzes websites and give information to webmasters for better SEO.
Websites have keyword metatags(one to many relationship) and for same keywords there may be other sites too. For example sqlservercentral.com has "SQL Server" tag and there are lots of websites has the same tag.
So I have 3 big tables.
1 - Sites table - 20.000.000 rows
SiteID - Primary Key
SiteUrl - Nonclustered Index
2 - Tags table - 30.000.000 rows
TagID - PK
Tag - Nonclustered Index
3 - SiteTags table - 120.000.000 rows
SiteID - Nonclustered Index, FK
TagID - Nonclustered Index, FK
If you check following urls you will see the problem. On first page there is no performance problem but if you navigate the pager, it will be slower(you will see why if you check stored procedure).
http://www.websiteanalyzer.info/Tag.aspx?tag=SQL%20Server&page=1
http://www.websiteanalyzer.info/Tag.aspx?tag=SQL%20Server&page=100
Following is the stored procedure running on these pages.
ALTER PROCEDURE [dbo].[spGetSiteListFromTagPager]
@Tag nvarchar(64),
@page int = 0
AS
DECLARE @TagID int
SET @TagID = (select top 1 TagID FROM Tags with(nolock) WHERE Tag=@Tag)
IF(@page = 1)
BEGIN
SELECT TOP 20 Sites.SiteUrl,Sites.Title,SUBSTRING(Sites.Description,1,500) as 'Description'
from Sites with(nolock)
INNER JOIN SiteTags with(nolock) ON Sites.SiteID=SiteTags.SiteID
WHERE SiteTags.TagID=@TagID
END
ELSE IF(@page > 1)
BEGIN
create table #t(SiteID bigint NOT NULL)
ALTER TABLE #t ADD PRIMARY KEY CLUSTERED (SiteID)
insert #t
select distinct SiteID FROM SiteTags with(nolock) WHERE TagID = @TagID
--CREATE INDEX #t_IDX_SiteID1 ON #t(SiteID)
CREATE TABLE #t2(UID int IDENTITY(1,1) NOT NULL,SiteUrl varchar(120),Title varchar(500),Description nvarchar(500))
ALTER TABLE #t2 ADD PRIMARY KEY CLUSTERED (UID)
INSERT #t2
SELECT TOP (@page*20) Sites.SiteUrl,Sites.Title,SUBSTRING(Sites.Description,1,500) from Sites with(nolock)
INNER JOIN #t with(nolock) ON Sites.SiteID = #t.SiteID
SELECT --distinct
SiteUrl,Title,Description FROM #t2 WHERE
UID>=((@page-1)*20) AND UID<(@page*20)
END
By the way my web server conf: INTEL CORE I5 2400 3.10GHZ 6MB, 16MB 7200RPM SATA DISK, 16 GB RAM and I'm looking for a new server with SSD.
If you need more information pls let me know.
Thanks
December 10, 2012 at 1:37 pm
I'm wondering if your SUBSTRING function is causing performance grief. How many bytes is the Description field? Is that field contained in the index? If not, you'll end up with a table scan, so you might want to do something with that. Also, why do the SUBSTRING at the same time as the TOP n ? Why not join to the relevant TOP n records and pull the SUBSTRING only on the records that qualify on the TOP n ?
Here's the thought:
;WITH CTE AS (
SELECT TOP (@page*20) Sites.SiteUrl,Sites.Title, Sites.SiteID
FROM Sites WITH(nolock)
INNER JOIN #t WITH(nolock) ON Sites.SiteID = #t.SiteID
)
INSERT #t2
SELECT C.*, SUBSTRING(S.Description,1,500)
FROM CTE AS C
INNER JOIN Sites AS S
ON C.SiteID = S.SiteID
Not sure if that will help, but it might avoid yanking that Description field for a ton of records when you only need it for the qualifying records. If the CTE doesn't make that happen, you can temp table it instead and hopefully force it that way.
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
December 10, 2012 at 4:12 pm
Thank you for your advice, I changed as you written but it didn't change.It took 38 seconds
Description field type is nvarchar(2000)
There are other queries take huge executon times.
I believe that I paid lots of time for query optimization for lots of stored procedures and I should change database design.
Splitting tables or partitioning or something else. I don't know.
When I run this stored procedure with "Include Actual Execution Plan", I see that 100% cost on following statement.
WITH CTE AS (
SELECT TOP (@page*20) Sites.SiteUrl,Sites.Title, Sites.SiteID
FROM Sites WITH(nolock)
INNER JOIN #t WITH(nolock) ON Sites.SiteID = #t.SiteID
)
INSERT #t2
SELECT C.SiteUrl, C.Title, SUBSTRING(S.Description,1,500)
FROM CTE AS C
INNER JOIN Sites AS S
ON C.SiteID = S.SiteID
When I run with old script; I see on execution plan that 100% time on following.
Seek Keys[1]: Prefix: [OSA].[dbo].[Sites].SiteID = Scalar Operator([#t].[SiteID])
So as I understand that because of the Sites table has lots of rows, clustered index seeking on SiteID column takes time.
I didn't test it but if I split Sites table to 10 part then maybe queries will run faster.
If I split like following Sites1,Sites2,...Sites10 then If I create a view like following
CREATE VIEW [dbo].[vwSites]
AS
select * from Sites1 with(nolock)
UNION
select * from Sites2 with(nolock)
...
select * from Sites10 with(nolock)
then if I replace all "Sites" with "vwSites" on all sp and functions do you think that everything will be faster?
December 10, 2012 at 5:49 pm
Just splitting the table doesn't necessarily make things faster. The problem might be what to use as criteria to do the split. If, after all, you just have to put them all back together again, nothing is really gained. The question is whether you need to do the SUBSTRING (can the application handle and toss away any excess characters past 500?), or more importantly, is the Description field a part of the index? Another question related to splitting, is the overall design of the webpage. Do you really need to be able to handle a request for ANY given page number? Seems to me that any more than the top 100 websites for any given category might be serious overkill for most practical purposes, but then, I know nothing about what drives revenue for your company, so I can only ask the question as opposed to provide the answer. If it turns out that some fixed number of sites is "good enough" at any given time, then maybe you can design the insert / update process around maintaining the TOP N rows in an additional table as opposed to in a view, and then you might well have a lightning fast site. Again, just tossing around ideas (IOW, grasping at what might even be a straw here or there - LOL).
Anyway, let me know what you have in the Index. There might even be a way to use a Sparse Index if we can flag any given record with an additional field to indicate it's in the TOP N, with NULL value if it's not, then create a SPARSE INDEX on that field, including the other fields. You'll have to tell me...
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
December 11, 2012 at 1:31 am
Looking at the way you've done the paging it you should be able to simplify the whole thing, firstly the substring could be the cause, have you looked at other methods like LEFT, CONVERT to do the trimming?
I might also be temepted to do something like this,
Declare @StartRow int, @EndRow Int
Set @StartRow=@page*20
Set @EndRow=@StartRow+19
With CTE as(
SELECT
ROW_NUMBER() OVER(ORDER BY sites.siteUrl) Rn
,Sites.SiteUrl
,Sites.Title
,LEFT(Sites.Description,500)
--,CONVERT(Nvarchar(500),Sites.Description)
from Sites
INNER JOIN SiteTags t ON Sites.SiteID = t.SiteID
)
Select * from CTE
Where Rn Between @StartRow and @EndRow
It would be interesting to see if this works and is more performant, I have doubts about the CTE and the Order by, you might want to change that to SiteId rather than the SiteUrl it depends on the order you want to use.
_________________________________________________________________________
SSC Guide to Posting and Best Practices
December 11, 2012 at 8:09 am
Well SUBSTRING is not important too much so I removed it.
I think it's better to share table DDL.
CREATE TABLE [dbo].[Sites](
[SiteID] [bigint] IDENTITY(1,1) NOT NULL,
[SiteUrl] [varchar](120) NOT NULL,
[SiteFirstCheckDate] [datetime] NOT NULL,
[SiteLastCheckDate] [datetime] NOT NULL,
[SitePoint] [int] NOT NULL,
[Title] [nvarchar](500) NOT NULL,
[TitleLength] [int] NOT NULL,
[Description] [nvarchar](2000) NOT NULL,
[SiteWorth] [int] NULL,
[Encoding] [nvarchar](200) NULL,
[Author] [nvarchar](200) NULL,
CONSTRAINT [PK_Sites] PRIMARY KEY CLUSTERED
(
[SiteID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
ALTER TABLE [dbo].[Sites] ADD CONSTRAINT [DF_Sites_SiteFirstCheckDate] DEFAULT (getdate()) FOR [SiteFirstCheckDate]
GO
ALTER TABLE [dbo].[Sites] ADD CONSTRAINT [DF_Sites_SiteLastCheckDate] DEFAULT (getdate()) FOR [SiteLastCheckDate]
GO
ALTER TABLE [dbo].[Sites] ADD CONSTRAINT [DF_Sites_SitePoint] DEFAULT ((0)) FOR [SitePoint]
GO
Description column doesn't have any index. SiteUrl,SiteLastCheckDate and SitePoint have nonclustered index.
December 11, 2012 at 9:41 am
Just adding the Description field to the index on SiteID would probably help quite a bit, although at up to 2K per row, the size of that index would be potentially large, so keep that in mind for disk space usage. Also, there will be a hit on INSERT activity, as then that much more data has to hit the index for inserts and updates.
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
December 11, 2012 at 9:46 am
sgmunson (12/11/2012)
Just adding the Description field to the index on SiteID would probably help quite a bit, although at up to 2K per row, the size of that index would be potentially large, so keep that in mind for disk space usage. Also, there will be a hit on INSERT activity, as then that much more data has to hit the index for inserts and updates.
Surely You shouldnt need to add the Descrption as the SiteId is a clustered index anyway so at the leaf level its already included, and as Description isnt a part of any filter here, surely it doesnt make any difference
_________________________________________________________________________
SSC Guide to Posting and Best Practices
December 11, 2012 at 10:12 am
I didn't understand why you ask to add index for description field, it is not used on where clause?
December 11, 2012 at 2:20 pm
I modified as following and I think it's little bit changed. I removed tep tables and use built-in paging.
ALTER PROCEDURE [dbo].[spGetSiteListFromTagPager]
@Tag nvarchar(64),
@page int = 0
AS
DECLARE @TagID int
SET @TagID = (select top 1 TagID FROM Tags with(nolock) WHERE Tag=@Tag)
IF(@page = 1)
BEGIN
SELECT TOP 20 Sites.SiteUrl,Sites.Title,Sites.Description
from Sites with(nolock)
INNER JOIN SiteTags with(nolock) ON Sites.SiteID=SiteTags.SiteID
WHERE SiteTags.TagID=@TagID
END
ELSE IF(@page > 1)
BEGIN
SELECT Sites.SiteUrl,Sites.Title,Sites.Description from Sites with(nolock)
INNER JOIN (select distinct SiteID FROM SiteTags with(nolock) WHERE TagID = @TagID) as t ON Sites.SiteID = t.SiteID
ORDER BY t.SiteID
OFFSET (@page-1)*20 ROWS
FETCH NEXT 20 ROWS ONLY
END
December 11, 2012 at 2:47 pm
What's the distinct for? If you have messy data, clean it up. Distinct is not a cheap operation.
Watch the nolocks.
See - http://blogs.msdn.com/b/davidlean/archive/2009/04/06/sql-server-nolock-hint-other-poor-ideas.aspx
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
December 12, 2012 at 8:16 am
Jason-299789 (12/11/2012)
sgmunson (12/11/2012)
Just adding the Description field to the index on SiteID would probably help quite a bit, although at up to 2K per row, the size of that index would be potentially large, so keep that in mind for disk space usage. Also, there will be a hit on INSERT activity, as then that much more data has to hit the index for inserts and updates.Surely You shouldnt need to add the Descrption as the SiteId is a clustered index anyway so at the leaf level its already included, and as Description isnt a part of any filter here, surely it doesnt make any difference
Missed that... oops...
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply