Large OLTP table, Index common field, Slow INSERT performance

Question

Large OLTP table, Index common field, Slow INSERT performance

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply

Wilfred van Dijk SSCrazy Eights Points: 9003 More actions · Answer 1

If the need for this index is out of the question, maybe you'll have to rebuild/reorganize your indexes more frequently.

I recommend the script from Lara Rubbelke http://blogs.digineer.com/blogs/larar/archive/2006/08/16/smart-index-defrag-reindex-for-a-consolidated-sql-server-2005-environment.aspx

Wilfred
The best things in life are the simple things

Clive Strong SSChampion Points: 11222 More actions · Answer 2

Just a quick thought. Are you setting a fill factor on your indexes or leaving them at 100%?

jvamvas Say Hey Kid Points: 708 More actions · Answer 3

Also consider 1)changing recovery model

2)use TABLOCK

http://www.ITjobfeed.com

Carl Federl One Orange Chip Points: 25384 More actions · Answer 4

Are you aware the "TIME" is a reserved word and that there is a "TIME" datatype in SQL Server 2008 ? Please consider renaming the column. The same caution applies to "DATE".

Regarding you thruput problem, have you considered using partioning ?

For the partioning design you need to decide if to optimize for inserts or for selects. For the common search cases, do you really mean the last 100 or is the real requirement "all Transactions that have occured within a historical period."? e.g. is this a more appropriate query:

SELECT *

from Transactions

where CustId = '010101037'

AND TIME >= (some value)

order by Time desc

If historical period, then you need to decide the granularity of time and any "rounding". Granularity could be any of hours, days, months, years or centuries. Rounding could be exact (last 60 days) or have a cut-off

(since the start of the previous month)

If within a historical period, then partitioning by TIME ranges would seem to make the most sense and apply the same partitioning scheme for the secondary indexes.

SQL = Scarcely Qualifies as a Language

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 5

My guess is that you need a lower fill factor on the NC indexes coupled with routine maintenance to get back to that fill factor. This will minimize the page splits that occur during inserts, which is the likely culprit of the slowness. Also use the pad index and sort in tempdb settings. If you have Enterprise Edition you can use the online index rebuild feature if you require 24/7 insert ability.

You can monitor the Access Methods: page splits/sec perfmon counter to see if page splits are occuring.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

noeld SSC Guru Points: 96590 More actions · Answer 6

you must definitely "limit" the Time column with a "range" once you do that you should be fine.

* Noel

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 7

noeld (6/20/2008)
you must definitely "limit" the Time column with a "range" once you do that you should be fine.

This is being done. The combination of top 100 and order by. Since the clustering key (time) is carried in the NC index he will place on the filter column it should be just spiffy quick to access the data.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 8

jvamvas (6/20/2008)
Also consider 1)changing recovery model
2)use TABLOCK

What????

Do the two queries in question need to be SELECT *?

Have you checked the execution plans of the select queries to ensure that the indexes are been used?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

MannySingh SSChampion Points: 12465 More actions · Answer 9

indexes is a good strategy, try to accomodate and use covering indexes to improve query performances....

Also Look into Table partitioning and multiple filegroups.

Maninder
www.dbanation.com

noeld SSC Guru Points: 96590 More actions · Answer 10

TheSQLGuru (6/20/2008)
noeld (6/20/2008)
you must definitely "limit" the Time column with a "range" once you do that you should be fine.
This is being done. The combination of top 100 and order by. Since the clustering key (time) is carried in the NC index he will place on the filter column it should be just spiffy quick to access the data.

I have to disagree. top 100 and order by in a very large table could cause a SCAN! A where clause with upper and lower bounds will make it into a SEEK. Now, that, will really do the Job.

* Noel

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 11

noeld (6/23/2008)
TheSQLGuru (6/20/2008)
noeld (6/20/2008)
you must definitely "limit" the Time column with a "range" once you do that you should be fine.
This is being done. The combination of top 100 and order by. Since the clustering key (time) is carried in the NC index he will place on the filter column it should be just spiffy quick to access the data.
I have to disagree. top 100 and order by in a very large table could cause a SCAN! A where clause with upper and lower bounds will make it into a SEEK. Now, that, will really do the Job.

You can disagree all you want, but there are two problems:

1) Your requirement is not in line with the OP's need. How can you be sure of getting the necessary 100 rows with a specific WHERE clause value if you limit the time range?

2) A scan of the clustered index in this case can be a good thing, because the OP was using SELECT TOP 100 *, which would require a bookmark lookup to get the data even if a NC index seek on a where clause column were to occur. Consider AdventureWorks.Production.TransactionHistory. This table has clustered PK on TransactionID (which is int identity and is likely a fair representation of using bigint for time series) and a NC index on ProductID, so it is a good approximation of this issue.

select top 100 *

from Production.TransactionHistory

where ProductID = 712 --2348 out of 113443 values

order by TransactionID desc

0.0357 cost, CI scan

logical reads 37

select top 10 * --using smaller value to better simulate the ratios given by the OP

from Production.TransactionHistory

where ProductID = 712 --2348 out of 113443 values

order by TransactionID desc

0.0065, CI scan

logical reads 13

--forcing index seek

select top 10 *

from Production.TransactionHistory with (index=IX_TransactionHistory_ProductID)

where ProductID = 712 --2348 out of 113443 values

order by TransactionID desc

0.036 cost (gets worse if you use top 100)

logical reads 37

thus for larger-percentage values such as 712 (2348/113443), the optimizer predicts (correctly in this case) that a CI scan results in lower cost due to the lower number of pages necessary to hit 100 ProductID = 712 rows in decending TransactionID order. Certainly a situation could exist where ProductID values were not distributed (semi)uniformly and this plan could be less efficient than index seek on ProductID

--now for a more selective value

select top 100 *

from Production.TransactionHistory

where ProductID = 521 --only 200 out of 113443 values

order by TransactionID desc

0.312 cost, ProductID index seek, bookmark lookup

logical reads 317

--due to higher specificity, optimizer realizes it would take too many CI page reads to hit the required 100 rows with ProductID = 521

--force CI scan to prove this

select top 100 *

from Production.TransactionHistory WITH (index=0)

where ProductID = 521 --only 200 out of 113443 values

order by TransactionID desc

0.745 cost

logical reads 792 (entire table)

Perhaps someone else could do some testing with larger numbers of rows (IIRC the OP mentioned 20M+ with distinct numbers of values given for the two filter columns).

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

noeld SSC Guru Points: 96590 More actions · Answer 12

TheSQLGuru (6/23/2008)
noeld (6/23/2008)
TheSQLGuru (6/20/2008)
noeld (6/20/2008)
you must definitely "limit" the Time column with a "range" once you do that you should be fine.
This is being done. The combination of top 100 and order by. Since the clustering key (time) is carried in the NC index he will place on the filter column it should be just spiffy quick to access the data.
I have to disagree. top 100 and order by in a very large table could cause a SCAN! A where clause with upper and lower bounds will make it into a SEEK. Now, that, will really do the Job.
You can disagree all you want, but there are two problems:
1) Your requirement is not in line with the OP's need. How can you be sure of getting the necessary 100 rows with a specific WHERE clause value if you limit the time range?
2) A scan of the clustered index in this case can be a good thing, because the OP was using SELECT TOP 100 *, which would require a bookmark lookup to get the data even if a NC index seek on a where clause column were to occur. Consider AdventureWorks.Production.TransactionHistory. This table has clustered PK on TransactionID (which is int identity and is likely a fair representation of using bigint for time series) and a NC index on ProductID, so it is a good approximation of this issue.
select top 100 *
from Production.TransactionHistory
where ProductID = 712 --2348 out of 113443 values
order by TransactionID desc
0.0357 cost, CI scan
logical reads 37
select top 10 * --using smaller value to better simulate the ratios given by the OP
from Production.TransactionHistory
where ProductID = 712 --2348 out of 113443 values
order by TransactionID desc
0.0065, CI scan
logical reads 13
--forcing index seek
select top 10 *
from Production.TransactionHistory with (index=IX_TransactionHistory_ProductID)
where ProductID = 712 --2348 out of 113443 values
order by TransactionID desc
0.036 cost (gets worse if you use top 100)
logical reads 37
thus for larger-percentage values such as 712 (2348/113443), the optimizer predicts (correctly in this case) that a CI scan results in lower cost due to the lower number of pages necessary to hit 100 ProductID = 712 rows in decending TransactionID order. Certainly a situation could exist where ProductID values were not distributed (semi)uniformly and this plan could be less efficient than index seek on ProductID
--now for a more selective value
select top 100 *
from Production.TransactionHistory
where ProductID = 521 --only 200 out of 113443 values
order by TransactionID desc
0.312 cost, ProductID index seek, bookmark lookup
logical reads 317
--due to higher specificity, optimizer realizes it would take too many CI page reads to hit the required 100 rows with ProductID = 521
--force CI scan to prove this
select top 100 *
from Production.TransactionHistory WITH (index=0)
where ProductID = 521 --only 200 out of 113443 values
order by TransactionID desc
0.745 cost
logical reads 792 (entire table)
Perhaps someone else could do some testing with larger numbers of rows (IIRC the OP mentioned 20M+ with distinct numbers of values given for the two filter columns).

All I am saying is that SELECT TOP 100 * ... ORDER BY without a WHERE CLAUSE is not a good thing especially when you cluster by TIME. Limiting the range (IF possible) would provide fast response. I do understand though that it may not be possible.

* Noel

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 13

All I am saying is that SELECT TOP 100 * ... ORDER BY without a WHERE CLAUSE is not a good thing especially when you cluster by TIME. Limiting the range (IF possible) would provide fast response. I do understand though that it may not be possible.

I disagree with this statement as well.

1) The OP DID have a where clause on both of his sample queries.

2) TOP 100 * ... ORDER BY is a VERY efficient plan as long as the field ORDER'd by is indexed (either clustered or nonclustered will suffice). That was the case for the OP as well but I believe that this is a generically valid statement.

3) Limiting the range of of a CI value without a where clause does provide some savings on query cost, but none on IO or actual server effort.

select top 100 *

from Production.TransactionHistory

order by TransactionID desc

0.0039, CI scan

logical reads 3

select top 100 *

from Production.TransactionHistory

--filter to product exact same return as no-where-clause query above

where TransactionID between 213343 and 213442

order by TransactionID desc

0.0034

logical reads 3

The cost shows as 14.7% higher (a clear win), but the page reads and amount of data put through the CPUs is identical. The only difference in the plans is seek vice scan. The cost difference is probably due to the built-in mathematical values assigned to the seek/scan operators. Both queries also show 1ms execution time as well.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

noeld SSC Guru Points: 96590 More actions · Answer 14

The cost shows as 14.7% higher (a clear win), but the page reads and amount of data put through the CPUs is identical. The only difference in the plans is seek vice scan. The cost difference is probably due to the built-in mathematical values assigned to the seek/scan operators. Both queries also show 1ms execution time as well

This is a NON fair comparison. TransactionID is "very" selective. try using a time-based column and the scans for the "TOP" will be a lot less efficient.

* Noel

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 15

noeld (6/23/2008)
The cost shows as 14.7% higher (a clear win), but the page reads and amount of data put through the CPUs is identical. The only difference in the plans is seek vice scan. The cost difference is probably due to the built-in mathematical values assigned to the seek/scan operators. Both queries also show 1ms execution time as well
This is a NON fair comparison. TransactionID is "very" selective. try using a time-based column and the scans for the "TOP" will be a lot less efficient.

Sorry Noel, again an incorrect statement. It would seem that you may be a bit unclear on how the engine returns data when TOP/ORDER BY and a clustered index is in play for the ORDER BY. It will either seek or scan to the appropriate starting page and read data in page order until sufficient rows are read. The starting value (if filtered) or the fact that no filter is present is essentially irrelevant. Even disregarding the OP's statement "BIGINT is a high resolution timestamp" (which IMHO makes an identity on 113K rows a reasonable approximation of 20M+ rows with that design), a NON-specific value for a clustered index has essentially NO bearing on the cost and IO when TOP and ORDER BY are used.

--Here the OrderDateKey represents a time series field that isn't very specific

--lets make a bigger table this time too

select *

into #frs

from (

select SalesOrderNumber, OrderQuantity, SalesAmount, ProductKey, OrderDateKey

from AdventureWorksDW.dbo.FactResellerSales

union all

select SalesOrderNumber, OrderQuantity, SalesAmount, ProductKey, OrderDateKey

from AdventureWorksDW.dbo.FactResellerSales

union all

select SalesOrderNumber, OrderQuantity, SalesAmount, ProductKey, OrderDateKey

from AdventureWorksDW.dbo.FactResellerSales

union all

select SalesOrderNumber, OrderQuantity, SalesAmount, ProductKey, OrderDateKey

from AdventureWorksDW.dbo.FactResellerSales

union all

select SalesOrderNumber, OrderQuantity, SalesAmount, ProductKey, OrderDateKey

from AdventureWorksDW.dbo.FactResellerSales

union all

select SalesOrderNumber, OrderQuantity, SalesAmount, ProductKey, OrderDateKey

from AdventureWorksDW.dbo.FactResellerSales

union all

select SalesOrderNumber, OrderQuantity, SalesAmount, ProductKey, OrderDateKey

from AdventureWorksDW.dbo.FactResellerSales) as t

go

--(425985 row(s) affected)

backup log tempdb with truncate_only

go

create clustered index idx1 on #frs (OrderDateKey)

go

backup log tempdb with truncate_only

go

select OrderDateKey, count(*)

from #frs

group by OrderDateKey

793 has 26320 rows out of 425985 --> very non-specific

1067 has 21014 rows and is the highest key

select top 100 *

from #frs

order by OrderDateKey desc

0.0038661 cost, CI scan

logical reads 4

select top 100 *

from #frs

WHERE OrderDateKey = 1067 --lowest value, so equals first return

order by OrderDateKey desc

0.0038628, CI seek

logical reads 3

select top 100 *

from #frs

WHERE OrderDateKey = 793 --non-specific value

order by OrderDateKey desc

0.0038635, CI seek

logical reads 3

select top 100 *

from #frs

WHERE OrderDateKey between 700 and 800

order by OrderDateKey desc

0.0038662, CI seek

logical reads 3

NOTE: this is actually 0.0000001 LESS than the unfiltered query above, despite 1 IO less

select top 100 *

from #frs

WHERE OrderDateKey between 1030 and 1035

order by OrderDateKey desc

14 rows only

0.0032988

logical reads 3

even a very specific filter that results in < 100 total rows doesn't save anything

on reads and only a tad on the cost

drop table #frs

go

backup log tempdb with truncate_only

go

1ms excution time for all. Seek does save 25% IO, but it is ONE 8k page. I suppose you could call that "a lot less efficient" but I do not in the big picture of server activity. Also the less specific your time values are the less effective a range filter you proposed would become. 1007 to 1035 has a total of 14 rows matching the filter, but 793 to 793 has 26320 rows matching the filter.

You could actually create a table with EVERY value the same for the clustered index and it would be the same result.

select *

into #frs

from (

select SalesOrderNumber, OrderQuantity, SalesAmount, ProductKey, 1 as OrderDateKey

from AdventureWorksDW.dbo.FactResellerSales

union all

select SalesOrderNumber, OrderQuantity, SalesAmount, ProductKey, 1 as OrderDateKey

from AdventureWorksDW.dbo.FactResellerSales

union all

select SalesOrderNumber, OrderQuantity, SalesAmount, ProductKey, 1 as OrderDateKey

from AdventureWorksDW.dbo.FactResellerSales

union all

select SalesOrderNumber, OrderQuantity, SalesAmount, ProductKey, 1 as OrderDateKey

from AdventureWorksDW.dbo.FactResellerSales

union all

select SalesOrderNumber, OrderQuantity, SalesAmount, ProductKey, 1 as OrderDateKey

from AdventureWorksDW.dbo.FactResellerSales

union all

select SalesOrderNumber, OrderQuantity, SalesAmount, ProductKey, 1 as OrderDateKey

from AdventureWorksDW.dbo.FactResellerSales

union all

select SalesOrderNumber, OrderQuantity, SalesAmount, ProductKey, 1 as OrderDateKey

from AdventureWorksDW.dbo.FactResellerSales) as t

go

--(425985 row(s) affected)

backup log tempdb with truncate_only

go

create clustered index idx1 on #frs (OrderDateKey)

go

backup log tempdb with truncate_only

go

select top 100 *

from #frs

order by OrderDateKey desc

0.0038661 cost, CI scan

logical reads 4

select top 100 *

from #frs

WHERE OrderDateKey = 1

order by OrderDateKey desc

0.0038661 cost, CI seek

logical reads 3

drop table #frs

go

backup log tempdb with truncate_only

go

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service