Optimization advice for huge tables

Question

Optimization advice for huge tables

funkseo

SSC Rookie

Points: 33
More actions
December 10, 2012 at 12:04 pm

#286103

Hello,
I need tuning advices for large tables query performance.
I have 3 big tables and I need to join these 3 tables on some queries and it's getting slower every day because of the table rows size increasing.
I believe I made the index tuning.
I'd be glad if you suggest me for better performance tuning.
It is not easy to split tables. I tried to create another data files but it didn't make a change.
Each table 5-6 GB size by the way.
Thanks in advance.

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply

sgmunson SSC Guru Points: 110651 More actions · Answer 1

Performance tuning is generally done for a specific performance goal, and without some detail on what you already have in the way of tables, indexes, row sizes, row counts, and the query (ies?) that need help, general advice might not be terribly useful. There are a number of possible techniques to help out, such as Partitioning, Sparse Indexes (typically for fields with large numbers of rows with NULL values. or for situations where just 1 value ou all possible values for a field is of interest), Index query hints, query optimization, temp tables, table variables, etc...

You haven't provided enough information to go into any detail, however... Care to elaborate?

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 2

When working with large tables:

Retrieve just the data that you absolutely need.

Understand that no one is going to read several million rows, hence do not query several million rows and return that to the client. It is a waste

Make sure your indexes support the queries

Make sure the queries are written to be able to use indexes

Partitioning is not primarily a performance tuning mechanism, it's mostly about maintainability.

Avoid index hints unless you are absolutely, completely sure that you know better than the optimiser what the best plan for a query is

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

funkseo SSC Rookie Points: 33 More actions · Answer 3

Hello Steve,

You're right. Let me give detailed information.

My website is http://www.websiteanalyzer.info

It analyzes websites and give information to webmasters for better SEO.

Websites have keyword metatags(one to many relationship) and for same keywords there may be other sites too. For example sqlservercentral.com has "SQL Server" tag and there are lots of websites has the same tag.

So I have 3 big tables.

1 - Sites table - 20.000.000 rows

SiteID - Primary Key

SiteUrl - Nonclustered Index

2 - Tags table - 30.000.000 rows

TagID - PK

Tag - Nonclustered Index

3 - SiteTags table - 120.000.000 rows

SiteID - Nonclustered Index, FK

TagID - Nonclustered Index, FK

If you check following urls you will see the problem. On first page there is no performance problem but if you navigate the pager, it will be slower(you will see why if you check stored procedure).

http://www.websiteanalyzer.info/Tag.aspx?tag=SQL%20Server&page=1

http://www.websiteanalyzer.info/Tag.aspx?tag=SQL%20Server&page=100

Following is the stored procedure running on these pages.

ALTER PROCEDURE [dbo].[spGetSiteListFromTagPager]

@Tag nvarchar(64),

@page int = 0

AS

DECLARE @TagID int

SET @TagID = (select top 1 TagID FROM Tags with(nolock) WHERE Tag=@Tag)

IF(@page = 1)

BEGIN

SELECT TOP 20 Sites.SiteUrl,Sites.Title,SUBSTRING(Sites.Description,1,500) as 'Description'

from Sites with(nolock)

INNER JOIN SiteTags with(nolock) ON Sites.SiteID=SiteTags.SiteID

WHERE SiteTags.TagID=@TagID

END

ELSE IF(@page > 1)

BEGIN

create table #t(SiteID bigint NOT NULL)

ALTER TABLE #t ADD PRIMARY KEY CLUSTERED (SiteID)

insert #t

select distinct SiteID FROM SiteTags with(nolock) WHERE TagID = @TagID

--CREATE INDEX #t_IDX_SiteID1 ON #t(SiteID)

CREATE TABLE #t2(UID int IDENTITY(1,1) NOT NULL,SiteUrl varchar(120),Title varchar(500),Description nvarchar(500))

ALTER TABLE #t2 ADD PRIMARY KEY CLUSTERED (UID)

INSERT #t2

SELECT TOP (@page*20) Sites.SiteUrl,Sites.Title,SUBSTRING(Sites.Description,1,500) from Sites with(nolock)

INNER JOIN #t with(nolock) ON Sites.SiteID = #t.SiteID

SELECT --distinct

SiteUrl,Title,Description FROM #t2 WHERE

UID>=((@page-1)*20) AND UID<(@page*20)

END

By the way my web server conf: INTEL CORE I5 2400 3.10GHZ 6MB, 16MB 7200RPM SATA DISK, 16 GB RAM and I'm looking for a new server with SSD.

If you need more information pls let me know.

Thanks

sgmunson SSC Guru Points: 110651 More actions · Answer 4

I'm wondering if your SUBSTRING function is causing performance grief. How many bytes is the Description field? Is that field contained in the index? If not, you'll end up with a table scan, so you might want to do something with that. Also, why do the SUBSTRING at the same time as the TOP n ? Why not join to the relevant TOP n records and pull the SUBSTRING only on the records that qualify on the TOP n ?

Here's the thought:

;WITH CTE AS (

SELECT TOP (@page*20) Sites.SiteUrl,Sites.Title, Sites.SiteID

FROM Sites WITH(nolock)

INNER JOIN #t WITH(nolock) ON Sites.SiteID = #t.SiteID

)

INSERT #t2

SELECT C.*, SUBSTRING(S.Description,1,500)

FROM CTE AS C

INNER JOIN Sites AS S

ON C.SiteID = S.SiteID

Not sure if that will help, but it might avoid yanking that Description field for a ton of records when you only need it for the qualifying records. If the CTE doesn't make that happen, you can temp table it instead and hopefully force it that way.

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

funkseo SSC Rookie Points: 33 More actions · Answer 5

Thank you for your advice, I changed as you written but it didn't change.It took 38 seconds

Description field type is nvarchar(2000)

There are other queries take huge executon times.

I believe that I paid lots of time for query optimization for lots of stored procedures and I should change database design.

Splitting tables or partitioning or something else. I don't know.

When I run this stored procedure with "Include Actual Execution Plan", I see that 100% cost on following statement.

WITH CTE AS (

SELECT TOP (@page*20) Sites.SiteUrl,Sites.Title, Sites.SiteID

FROM Sites WITH(nolock)

INNER JOIN #t WITH(nolock) ON Sites.SiteID = #t.SiteID

)

INSERT #t2

SELECT C.SiteUrl, C.Title, SUBSTRING(S.Description,1,500)

FROM CTE AS C

INNER JOIN Sites AS S

ON C.SiteID = S.SiteID

When I run with old script; I see on execution plan that 100% time on following.

Seek Keys[1]: Prefix: [OSA].[dbo].[Sites].SiteID = Scalar Operator([#t].[SiteID])

So as I understand that because of the Sites table has lots of rows, clustered index seeking on SiteID column takes time.

I didn't test it but if I split Sites table to 10 part then maybe queries will run faster.

If I split like following Sites1,Sites2,...Sites10 then If I create a view like following

CREATE VIEW [dbo].[vwSites]

AS

select * from Sites1 with(nolock)

UNION

select * from Sites2 with(nolock)

...

select * from Sites10 with(nolock)

then if I replace all "Sites" with "vwSites" on all sp and functions do you think that everything will be faster?

sgmunson SSC Guru Points: 110651 More actions · Answer 6

Just splitting the table doesn't necessarily make things faster. The problem might be what to use as criteria to do the split. If, after all, you just have to put them all back together again, nothing is really gained. The question is whether you need to do the SUBSTRING (can the application handle and toss away any excess characters past 500?), or more importantly, is the Description field a part of the index? Another question related to splitting, is the overall design of the webpage. Do you really need to be able to handle a request for ANY given page number? Seems to me that any more than the top 100 websites for any given category might be serious overkill for most practical purposes, but then, I know nothing about what drives revenue for your company, so I can only ask the question as opposed to provide the answer. If it turns out that some fixed number of sites is "good enough" at any given time, then maybe you can design the insert / update process around maintaining the TOP N rows in an additional table as opposed to in a view, and then you might well have a lightning fast site. Again, just tossing around ideas (IOW, grasping at what might even be a straw here or there - LOL).

Anyway, let me know what you have in the Index. There might even be a way to use a Sparse Index if we can flag any given record with an additional field to indicate it's in the TOP N, with NULL value if it's not, then create a SPARSE INDEX on that field, including the other fields. You'll have to tell me...

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

Jason-299789 SSC-Insane Points: 21601 More actions · Answer 7

Looking at the way you've done the paging it you should be able to simplify the whole thing, firstly the substring could be the cause, have you looked at other methods like LEFT, CONVERT to do the trimming?

I might also be temepted to do something like this,

Declare @StartRow int, @EndRow Int

Set @StartRow=@page*20

Set @EndRow=@StartRow+19

With CTE as(

SELECT

ROW_NUMBER() OVER(ORDER BY sites.siteUrl) Rn

,Sites.SiteUrl

,Sites.Title

,LEFT(Sites.Description,500)

--,CONVERT(Nvarchar(500),Sites.Description)

from Sites

INNER JOIN SiteTags t ON Sites.SiteID = t.SiteID

)

Select * from CTE

Where Rn Between @StartRow and @EndRow

It would be interesting to see if this works and is more performant, I have doubts about the CTE and the Order by, you might want to change that to SiteId rather than the SiteUrl it depends on the order you want to use.

_________________________________________________________________________
SSC Guide to Posting and Best Practices

funkseo SSC Rookie Points: 33 More actions · Answer 8

Well SUBSTRING is not important too much so I removed it.

I think it's better to share table DDL.

CREATE TABLE [dbo].[Sites](

[SiteID] [bigint] IDENTITY(1,1) NOT NULL,

[SiteUrl] [varchar](120) NOT NULL,

[SiteFirstCheckDate] [datetime] NOT NULL,

[SiteLastCheckDate] [datetime] NOT NULL,

[SitePoint] [int] NOT NULL,

[Title] [nvarchar](500) NOT NULL,

[TitleLength] [int] NOT NULL,

[Description] [nvarchar](2000) NOT NULL,

[SiteWorth] [int] NULL,

[Encoding] [nvarchar](200) NULL,

[Author] [nvarchar](200) NULL,

CONSTRAINT [PK_Sites] PRIMARY KEY CLUSTERED

(

[SiteID] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

) ON [PRIMARY]

GO

SET ANSI_PADDING OFF

GO

ALTER TABLE [dbo].[Sites] ADD CONSTRAINT [DF_Sites_SiteFirstCheckDate] DEFAULT (getdate()) FOR [SiteFirstCheckDate]

GO

ALTER TABLE [dbo].[Sites] ADD CONSTRAINT [DF_Sites_SiteLastCheckDate] DEFAULT (getdate()) FOR [SiteLastCheckDate]

GO

ALTER TABLE [dbo].[Sites] ADD CONSTRAINT [DF_Sites_SitePoint] DEFAULT ((0)) FOR [SitePoint]

GO

Description column doesn't have any index. SiteUrl,SiteLastCheckDate and SitePoint have nonclustered index.

sgmunson SSC Guru Points: 110651 More actions · Answer 9

Just adding the Description field to the index on SiteID would probably help quite a bit, although at up to 2K per row, the size of that index would be potentially large, so keep that in mind for disk space usage. Also, there will be a hit on INSERT activity, as then that much more data has to hit the index for inserts and updates.

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

Jason-299789 SSC-Insane Points: 21601 More actions · Answer 10

sgmunson (12/11/2012)
Just adding the Description field to the index on SiteID would probably help quite a bit, although at up to 2K per row, the size of that index would be potentially large, so keep that in mind for disk space usage. Also, there will be a hit on INSERT activity, as then that much more data has to hit the index for inserts and updates.

Surely You shouldnt need to add the Descrption as the SiteId is a clustered index anyway so at the leaf level its already included, and as Description isnt a part of any filter here, surely it doesnt make any difference

_________________________________________________________________________
SSC Guide to Posting and Best Practices

funkseo SSC Rookie Points: 33 More actions · Answer 11

I didn't understand why you ask to add index for description field, it is not used on where clause?

funkseo SSC Rookie Points: 33 More actions · Answer 12

I modified as following and I think it's little bit changed. I removed tep tables and use built-in paging.

ALTER PROCEDURE [dbo].[spGetSiteListFromTagPager]

@Tag nvarchar(64),

@page int = 0

AS

DECLARE @TagID int

SET @TagID = (select top 1 TagID FROM Tags with(nolock) WHERE Tag=@Tag)

IF(@page = 1)

BEGIN

SELECT TOP 20 Sites.SiteUrl,Sites.Title,Sites.Description

from Sites with(nolock)

INNER JOIN SiteTags with(nolock) ON Sites.SiteID=SiteTags.SiteID

WHERE SiteTags.TagID=@TagID

END

ELSE IF(@page > 1)

BEGIN

SELECT Sites.SiteUrl,Sites.Title,Sites.Description from Sites with(nolock)

INNER JOIN (select distinct SiteID FROM SiteTags with(nolock) WHERE TagID = @TagID) as t ON Sites.SiteID = t.SiteID

ORDER BY t.SiteID

OFFSET (@page-1)*20 ROWS

FETCH NEXT 20 ROWS ONLY

END

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 13

What's the distinct for? If you have messy data, clean it up. Distinct is not a cheap operation.

Watch the nolocks.

See - http://blogs.msdn.com/b/davidlean/archive/2009/04/06/sql-server-nolock-hint-other-poor-ideas.aspx

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

sgmunson SSC Guru Points: 110651 More actions · Answer 14

Jason-299789 (12/11/2012)
sgmunson (12/11/2012)
Just adding the Description field to the index on SiteID would probably help quite a bit, although at up to 2K per row, the size of that index would be potentially large, so keep that in mind for disk space usage. Also, there will be a hit on INSERT activity, as then that much more data has to hit the index for inserts and updates.
Surely You shouldnt need to add the Descrption as the SiteId is a clustered index anyway so at the leaf level its already included, and as Description isnt a part of any filter here, surely it doesnt make any difference

Missed that... oops...

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)