Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Optimization advice for huge tables Expand / Collapse
Author
Message
Posted Monday, December 10, 2012 12:04 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Sunday, January 06, 2013 3:06 PM
Points: 7, Visits: 30
Hello,

I need tuning advices for large tables query performance.
I have 3 big tables and I need to join these 3 tables on some queries and it's getting slower every day because of the table rows size increasing.

I believe I made the index tuning.
I'd be glad if you suggest me for better performance tuning.

It is not easy to split tables. I tried to create another data files but it didn't make a change.
Each table 5-6 GB size by the way.

Thanks in advance.
Post #1394740
Posted Monday, December 10, 2012 12:26 PM
UDP Broadcaster

UDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP Broadcaster

Group: General Forum Members
Last Login: Wednesday, May 15, 2013 8:20 AM
Points: 1,446, Visits: 1,883
Performance tuning is generally done for a specific performance goal, and without some detail on what you already have in the way of tables, indexes, row sizes, row counts, and the query (ies?) that need help, general advice might not be terribly useful. There are a number of possible techniques to help out, such as Partitioning, Sparse Indexes (typically for fields with large numbers of rows with NULL values. or for situations where just 1 value ou all possible values for a field is of interest), Index query hints, query optimization, temp tables, table variables, etc...

You haven't provided enough information to go into any detail, however... Care to elaborate?



Steve
(aka sgmunson)

Weight Loss Tips
Post #1394750
Posted Monday, December 10, 2012 12:49 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 12:59 PM
Points: 37,640, Visits: 29,895
When working with large tables:
Retrieve just the data that you absolutely need.
Understand that no one is going to read several million rows, hence do not query several million rows and return that to the client. It is a waste
Make sure your indexes support the queries
Make sure the queries are written to be able to use indexes
Partitioning is not primarily a performance tuning mechanism, it's mostly about maintainability.
Avoid index hints unless you are absolutely, completely sure that you know better than the optimiser what the best plan for a query is



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #1394758
Posted Monday, December 10, 2012 12:58 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Sunday, January 06, 2013 3:06 PM
Points: 7, Visits: 30
Hello Steve,

You're right. Let me give detailed information.
My website is http://www.websiteanalyzer.info
It analyzes websites and give information to webmasters for better SEO.
Websites have keyword metatags(one to many relationship) and for same keywords there may be other sites too. For example sqlservercentral.com has "SQL Server" tag and there are lots of websites has the same tag.

So I have 3 big tables.

1 - Sites table - 20.000.000 rows
SiteID - Primary Key
SiteUrl - Nonclustered Index

2 - Tags table - 30.000.000 rows
TagID - PK
Tag - Nonclustered Index

3 - SiteTags table - 120.000.000 rows
SiteID - Nonclustered Index, FK
TagID - Nonclustered Index, FK

If you check following urls you will see the problem. On first page there is no performance problem but if you navigate the pager, it will be slower(you will see why if you check stored procedure).

http://www.websiteanalyzer.info/Tag.aspx?tag=SQL%20Server&page=1
http://www.websiteanalyzer.info/Tag.aspx?tag=SQL%20Server&page=100

Following is the stored procedure running on these pages.

ALTER PROCEDURE [dbo].[spGetSiteListFromTagPager]
@Tag nvarchar(64),
@page int = 0
AS

DECLARE @TagID int
SET @TagID = (select top 1 TagID FROM Tags with(nolock) WHERE Tag=@Tag)

IF(@page = 1)
BEGIN

SELECT TOP 20 Sites.SiteUrl,Sites.Title,SUBSTRING(Sites.Description,1,500) as 'Description'
from Sites with(nolock)
INNER JOIN SiteTags with(nolock) ON Sites.SiteID=SiteTags.SiteID
WHERE SiteTags.TagID=@TagID

END

ELSE IF(@page > 1)
BEGIN
create table #t(SiteID bigint NOT NULL)
ALTER TABLE #t ADD PRIMARY KEY CLUSTERED (SiteID)

insert #t
select distinct SiteID FROM SiteTags with(nolock) WHERE TagID = @TagID
--CREATE INDEX #t_IDX_SiteID1 ON #t(SiteID)

CREATE TABLE #t2(UID int IDENTITY(1,1) NOT NULL,SiteUrl varchar(120),Title varchar(500),Description nvarchar(500))
ALTER TABLE #t2 ADD PRIMARY KEY CLUSTERED (UID)
INSERT #t2
SELECT TOP (@page*20) Sites.SiteUrl,Sites.Title,SUBSTRING(Sites.Description,1,500) from Sites with(nolock)
INNER JOIN #t with(nolock) ON Sites.SiteID = #t.SiteID

SELECT --distinct
SiteUrl,Title,Description FROM #t2 WHERE
UID>=((@page-1)*20) AND UID<(@page*20)
END


By the way my web server conf: INTEL CORE I5 2400 3.10GHZ 6MB, 16MB 7200RPM SATA DISK, 16 GB RAM and I'm looking for a new server with SSD.

If you need more information pls let me know.
Thanks
Post #1394763
Posted Monday, December 10, 2012 1:37 PM
UDP Broadcaster

UDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP Broadcaster

Group: General Forum Members
Last Login: Wednesday, May 15, 2013 8:20 AM
Points: 1,446, Visits: 1,883
I'm wondering if your SUBSTRING function is causing performance grief. How many bytes is the Description field? Is that field contained in the index? If not, you'll end up with a table scan, so you might want to do something with that. Also, why do the SUBSTRING at the same time as the TOP n ? Why not join to the relevant TOP n records and pull the SUBSTRING only on the records that qualify on the TOP n ?

Here's the thought:

;WITH CTE AS (

SELECT TOP (@page*20) Sites.SiteUrl,Sites.Title, Sites.SiteID
FROM Sites WITH(nolock)
INNER JOIN #t WITH(nolock) ON Sites.SiteID = #t.SiteID
)
INSERT #t2
SELECT C.*, SUBSTRING(S.Description,1,500)
FROM CTE AS C
INNER JOIN Sites AS S
ON C.SiteID = S.SiteID

Not sure if that will help, but it might avoid yanking that Description field for a ton of records when you only need it for the qualifying records. If the CTE doesn't make that happen, you can temp table it instead and hopefully force it that way.



Steve
(aka sgmunson)

Weight Loss Tips
Post #1394771
Posted Monday, December 10, 2012 4:12 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Sunday, January 06, 2013 3:06 PM
Points: 7, Visits: 30
Thank you for your advice, I changed as you written but it didn't change.It took 38 seconds
Description field type is nvarchar(2000)

There are other queries take huge executon times.
I believe that I paid lots of time for query optimization for lots of stored procedures and I should change database design.
Splitting tables or partitioning or something else. I don't know.

When I run this stored procedure with "Include Actual Execution Plan", I see that 100% cost on following statement.

WITH CTE AS (

SELECT TOP (@page*20) Sites.SiteUrl,Sites.Title, Sites.SiteID
FROM Sites WITH(nolock)
INNER JOIN #t WITH(nolock) ON Sites.SiteID = #t.SiteID
)
INSERT #t2
SELECT C.SiteUrl, C.Title, SUBSTRING(S.Description,1,500)
FROM CTE AS C
INNER JOIN Sites AS S
ON C.SiteID = S.SiteID

When I run with old script; I see on execution plan that 100% time on following.
Seek Keys[1]: Prefix: [OSA].[dbo].[Sites].SiteID = Scalar Operator([#t].[SiteID])

So as I understand that because of the Sites table has lots of rows, clustered index seeking on SiteID column takes time.

I didn't test it but if I split Sites table to 10 part then maybe queries will run faster.
If I split like following Sites1,Sites2,...Sites10 then If I create a view like following

CREATE VIEW [dbo].[vwSites]
AS
select * from Sites1 with(nolock)
UNION
select * from Sites2 with(nolock)
...
select * from Sites10 with(nolock)

then if I replace all "Sites" with "vwSites" on all sp and functions do you think that everything will be faster?
Post #1394808
Posted Monday, December 10, 2012 5:49 PM
UDP Broadcaster

UDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP Broadcaster

Group: General Forum Members
Last Login: Wednesday, May 15, 2013 8:20 AM
Points: 1,446, Visits: 1,883
Just splitting the table doesn't necessarily make things faster. The problem might be what to use as criteria to do the split. If, after all, you just have to put them all back together again, nothing is really gained. The question is whether you need to do the SUBSTRING (can the application handle and toss away any excess characters past 500?), or more importantly, is the Description field a part of the index? Another question related to splitting, is the overall design of the webpage. Do you really need to be able to handle a request for ANY given page number? Seems to me that any more than the top 100 websites for any given category might be serious overkill for most practical purposes, but then, I know nothing about what drives revenue for your company, so I can only ask the question as opposed to provide the answer. If it turns out that some fixed number of sites is "good enough" at any given time, then maybe you can design the insert / update process around maintaining the TOP N rows in an additional table as opposed to in a view, and then you might well have a lightning fast site. Again, just tossing around ideas (IOW, grasping at what might even be a straw here or there - LOL).

Anyway, let me know what you have in the Index. There might even be a way to use a Sparse Index if we can flag any given record with an additional field to indicate it's in the TOP N, with NULL value if it's not, then create a SPARSE INDEX on that field, including the other fields. You'll have to tell me...



Steve
(aka sgmunson)

Weight Loss Tips
Post #1394833
Posted Tuesday, December 11, 2012 1:31 AM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Friday, May 03, 2013 5:35 AM
Points: 803, Visits: 2,122
Looking at the way you've done the paging it you should be able to simplify the whole thing, firstly the substring could be the cause, have you looked at other methods like LEFT, CONVERT to do the trimming?

I might also be temepted to do something like this,
Declare @StartRow int, @EndRow Int

Set @StartRow=@page*20
Set @EndRow=@StartRow+19

With CTE as(
SELECT
ROW_NUMBER() OVER(ORDER BY sites.siteUrl) Rn
,Sites.SiteUrl
,Sites.Title
,LEFT(Sites.Description,500)
-- ,CONVERT(Nvarchar(500),Sites.Description)
from Sites
INNER JOIN SiteTags t ON Sites.SiteID = t.SiteID
)
Select * from CTE
Where Rn Between @StartRow and @EndRow

It would be interesting to see if this works and is more performant, I have doubts about the CTE and the Order by, you might want to change that to SiteId rather than the SiteUrl it depends on the order you want to use.



_________________________________________________________________________
SSC Guide to Posting and Best Practices
Post #1394930
Posted Tuesday, December 11, 2012 8:09 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Sunday, January 06, 2013 3:06 PM
Points: 7, Visits: 30
Well SUBSTRING is not important too much so I removed it.
I think it's better to share table DDL.

CREATE TABLE [dbo].[Sites](
[SiteID] [bigint] IDENTITY(1,1) NOT NULL,
[SiteUrl] [varchar](120) NOT NULL,
[SiteFirstCheckDate] [datetime] NOT NULL,
[SiteLastCheckDate] [datetime] NOT NULL,
[SitePoint] [int] NOT NULL,
[Title] [nvarchar](500) NOT NULL,
[TitleLength] [int] NOT NULL,
[Description] [nvarchar](2000) NOT NULL,
[SiteWorth] [int] NULL,
[Encoding] [nvarchar](200) NULL,
[Author] [nvarchar](200) NULL,
CONSTRAINT [PK_Sites] PRIMARY KEY CLUSTERED
(
[SiteID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO
SET ANSI_PADDING OFF
GO
ALTER TABLE [dbo].[Sites] ADD CONSTRAINT [DF_Sites_SiteFirstCheckDate] DEFAULT (getdate()) FOR [SiteFirstCheckDate]
GO
ALTER TABLE [dbo].[Sites] ADD CONSTRAINT [DF_Sites_SiteLastCheckDate] DEFAULT (getdate()) FOR [SiteLastCheckDate]
GO
ALTER TABLE [dbo].[Sites] ADD CONSTRAINT [DF_Sites_SitePoint] DEFAULT ((0)) FOR [SitePoint]
GO


Description column doesn't have any index. SiteUrl,SiteLastCheckDate and SitePoint have nonclustered index.
Post #1395145
Posted Tuesday, December 11, 2012 9:41 AM
UDP Broadcaster

UDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP Broadcaster

Group: General Forum Members
Last Login: Wednesday, May 15, 2013 8:20 AM
Points: 1,446, Visits: 1,883
Just adding the Description field to the index on SiteID would probably help quite a bit, although at up to 2K per row, the size of that index would be potentially large, so keep that in mind for disk space usage. Also, there will be a hit on INSERT activity, as then that much more data has to hit the index for inserts and updates.



Steve
(aka sgmunson)

Weight Loss Tips
Post #1395202
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse