Large OLTP table, Index common field, Slow INSERT performance

  • If the need for this index is out of the question, maybe you'll have to rebuild/reorganize your indexes more frequently.

    I recommend the script from Lara Rubbelke http://blogs.digineer.com/blogs/larar/archive/2006/08/16/smart-index-defrag-reindex-for-a-consolidated-sql-server-2005-environment.aspx

    Wilfred
    The best things in life are the simple things

  • Just a quick thought. Are you setting a fill factor on your indexes or leaving them at 100%?

  • Also consider 1)changing recovery model

    2)use TABLOCK

    http://www.ITjobfeed.com

  • Are you aware the "TIME" is a reserved word and that there is a "TIME" datatype in SQL Server 2008 ? Please consider renaming the column. The same caution applies to "DATE".

    Regarding you thruput problem, have you considered using partioning ?

    For the partioning design you need to decide if to optimize for inserts or for selects. For the common search cases, do you really mean the last 100 or is the real requirement "all Transactions that have occured within a historical period."? e.g. is this a more appropriate query:

    SELECT *

    from Transactions

    where CustId = '010101037'

    AND TIME >= (some value)

    order by Time desc

    If historical period, then you need to decide the granularity of time and any "rounding". Granularity could be any of hours, days, months, years or centuries. Rounding could be exact (last 60 days) or have a cut-off

    (since the start of the previous month)

    If within a historical period, then partitioning by TIME ranges would seem to make the most sense and apply the same partitioning scheme for the secondary indexes.

    SQL = Scarcely Qualifies as a Language

  • My guess is that you need a lower fill factor on the NC indexes coupled with routine maintenance to get back to that fill factor. This will minimize the page splits that occur during inserts, which is the likely culprit of the slowness. Also use the pad index and sort in tempdb settings. If you have Enterprise Edition you can use the online index rebuild feature if you require 24/7 insert ability.

    You can monitor the Access Methods: page splits/sec perfmon counter to see if page splits are occuring.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • you must definitely "limit" the Time column with a "range" once you do that you should be fine.


    * Noel

  • noeld (6/20/2008)


    you must definitely "limit" the Time column with a "range" once you do that you should be fine.

    This is being done. The combination of top 100 and order by. Since the clustering key (time) is carried in the NC index he will place on the filter column it should be just spiffy quick to access the data.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • jvamvas (6/20/2008)


    Also consider 1)changing recovery model

    2)use TABLOCK

    What????

    Do the two queries in question need to be SELECT *?

    Have you checked the execution plans of the select queries to ensure that the indexes are been used?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • indexes is a good strategy, try to accomodate and use covering indexes to improve query performances....

    Also Look into Table partitioning and multiple filegroups.

  • TheSQLGuru (6/20/2008)


    noeld (6/20/2008)


    you must definitely "limit" the Time column with a "range" once you do that you should be fine.

    This is being done. The combination of top 100 and order by. Since the clustering key (time) is carried in the NC index he will place on the filter column it should be just spiffy quick to access the data.

    I have to disagree. top 100 and order by in a very large table could cause a SCAN! A where clause with upper and lower bounds will make it into a SEEK. Now, that, will really do the Job.


    * Noel

  • noeld (6/23/2008)


    TheSQLGuru (6/20/2008)


    noeld (6/20/2008)


    you must definitely "limit" the Time column with a "range" once you do that you should be fine.

    This is being done. The combination of top 100 and order by. Since the clustering key (time) is carried in the NC index he will place on the filter column it should be just spiffy quick to access the data.

    I have to disagree. top 100 and order by in a very large table could cause a SCAN! A where clause with upper and lower bounds will make it into a SEEK. Now, that, will really do the Job.

    You can disagree all you want, but there are two problems:

    1) Your requirement is not in line with the OP's need. How can you be sure of getting the necessary 100 rows with a specific WHERE clause value if you limit the time range?

    2) A scan of the clustered index in this case can be a good thing, because the OP was using SELECT TOP 100 *, which would require a bookmark lookup to get the data even if a NC index seek on a where clause column were to occur. Consider AdventureWorks.Production.TransactionHistory. This table has clustered PK on TransactionID (which is int identity and is likely a fair representation of using bigint for time series) and a NC index on ProductID, so it is a good approximation of this issue.

    select top 100 *

    from Production.TransactionHistory

    where ProductID = 712 --2348 out of 113443 values

    order by TransactionID desc

    0.0357 cost, CI scan

    logical reads 37

    select top 10 * --using smaller value to better simulate the ratios given by the OP

    from Production.TransactionHistory

    where ProductID = 712 --2348 out of 113443 values

    order by TransactionID desc

    0.0065, CI scan

    logical reads 13

    --forcing index seek

    select top 10 *

    from Production.TransactionHistory with (index=IX_TransactionHistory_ProductID)

    where ProductID = 712 --2348 out of 113443 values

    order by TransactionID desc

    0.036 cost (gets worse if you use top 100)

    logical reads 37

    thus for larger-percentage values such as 712 (2348/113443), the optimizer predicts (correctly in this case) that a CI scan results in lower cost due to the lower number of pages necessary to hit 100 ProductID = 712 rows in decending TransactionID order. Certainly a situation could exist where ProductID values were not distributed (semi)uniformly and this plan could be less efficient than index seek on ProductID

    --now for a more selective value

    select top 100 *

    from Production.TransactionHistory

    where ProductID = 521 --only 200 out of 113443 values

    order by TransactionID desc

    0.312 cost, ProductID index seek, bookmark lookup

    logical reads 317

    --due to higher specificity, optimizer realizes it would take too many CI page reads to hit the required 100 rows with ProductID = 521

    --force CI scan to prove this

    select top 100 *

    from Production.TransactionHistory WITH (index=0)

    where ProductID = 521 --only 200 out of 113443 values

    order by TransactionID desc

    0.745 cost

    logical reads 792 (entire table)

    Perhaps someone else could do some testing with larger numbers of rows (IIRC the OP mentioned 20M+ with distinct numbers of values given for the two filter columns).

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (6/23/2008)


    noeld (6/23/2008)


    TheSQLGuru (6/20/2008)


    noeld (6/20/2008)


    you must definitely "limit" the Time column with a "range" once you do that you should be fine.

    This is being done. The combination of top 100 and order by. Since the clustering key (time) is carried in the NC index he will place on the filter column it should be just spiffy quick to access the data.

    I have to disagree. top 100 and order by in a very large table could cause a SCAN! A where clause with upper and lower bounds will make it into a SEEK. Now, that, will really do the Job.

    You can disagree all you want, but there are two problems:

    1) Your requirement is not in line with the OP's need. How can you be sure of getting the necessary 100 rows with a specific WHERE clause value if you limit the time range?

    2) A scan of the clustered index in this case can be a good thing, because the OP was using SELECT TOP 100 *, which would require a bookmark lookup to get the data even if a NC index seek on a where clause column were to occur. Consider AdventureWorks.Production.TransactionHistory. This table has clustered PK on TransactionID (which is int identity and is likely a fair representation of using bigint for time series) and a NC index on ProductID, so it is a good approximation of this issue.

    select top 100 *

    from Production.TransactionHistory

    where ProductID = 712 --2348 out of 113443 values

    order by TransactionID desc

    0.0357 cost, CI scan

    logical reads 37

    select top 10 * --using smaller value to better simulate the ratios given by the OP

    from Production.TransactionHistory

    where ProductID = 712 --2348 out of 113443 values

    order by TransactionID desc

    0.0065, CI scan

    logical reads 13

    --forcing index seek

    select top 10 *

    from Production.TransactionHistory with (index=IX_TransactionHistory_ProductID)

    where ProductID = 712 --2348 out of 113443 values

    order by TransactionID desc

    0.036 cost (gets worse if you use top 100)

    logical reads 37

    thus for larger-percentage values such as 712 (2348/113443), the optimizer predicts (correctly in this case) that a CI scan results in lower cost due to the lower number of pages necessary to hit 100 ProductID = 712 rows in decending TransactionID order. Certainly a situation could exist where ProductID values were not distributed (semi)uniformly and this plan could be less efficient than index seek on ProductID

    --now for a more selective value

    select top 100 *

    from Production.TransactionHistory

    where ProductID = 521 --only 200 out of 113443 values

    order by TransactionID desc

    0.312 cost, ProductID index seek, bookmark lookup

    logical reads 317

    --due to higher specificity, optimizer realizes it would take too many CI page reads to hit the required 100 rows with ProductID = 521

    --force CI scan to prove this

    select top 100 *

    from Production.TransactionHistory WITH (index=0)

    where ProductID = 521 --only 200 out of 113443 values

    order by TransactionID desc

    0.745 cost

    logical reads 792 (entire table)

    Perhaps someone else could do some testing with larger numbers of rows (IIRC the OP mentioned 20M+ with distinct numbers of values given for the two filter columns).

    All I am saying is that SELECT TOP 100 * ... ORDER BY without a WHERE CLAUSE is not a good thing especially when you cluster by TIME. Limiting the range (IF possible) would provide fast response. I do understand though that it may not be possible.


    * Noel

  • All I am saying is that SELECT TOP 100 * ... ORDER BY without a WHERE CLAUSE is not a good thing especially when you cluster by TIME. Limiting the range (IF possible) would provide fast response. I do understand though that it may not be possible.

    I disagree with this statement as well.

    1) The OP DID have a where clause on both of his sample queries.

    2) TOP 100 * ... ORDER BY is a VERY efficient plan as long as the field ORDER'd by is indexed (either clustered or nonclustered will suffice). That was the case for the OP as well but I believe that this is a generically valid statement.

    3) Limiting the range of of a CI value without a where clause does provide some savings on query cost, but none on IO or actual server effort.

    select top 100 *

    from Production.TransactionHistory

    order by TransactionID desc

    0.0039, CI scan

    logical reads 3

    select top 100 *

    from Production.TransactionHistory

    --filter to product exact same return as no-where-clause query above

    where TransactionID between 213343 and 213442

    order by TransactionID desc

    0.0034

    logical reads 3

    The cost shows as 14.7% higher (a clear win), but the page reads and amount of data put through the CPUs is identical. The only difference in the plans is seek vice scan. The cost difference is probably due to the built-in mathematical values assigned to the seek/scan operators. Both queries also show 1ms execution time as well.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • The cost shows as 14.7% higher (a clear win), but the page reads and amount of data put through the CPUs is identical. The only difference in the plans is seek vice scan. The cost difference is probably due to the built-in mathematical values assigned to the seek/scan operators. Both queries also show 1ms execution time as well

    This is a NON fair comparison. TransactionID is "very" selective. try using a time-based column and the scans for the "TOP" will be a lot less efficient.


    * Noel

  • noeld (6/23/2008)


    The cost shows as 14.7% higher (a clear win), but the page reads and amount of data put through the CPUs is identical. The only difference in the plans is seek vice scan. The cost difference is probably due to the built-in mathematical values assigned to the seek/scan operators. Both queries also show 1ms execution time as well

    This is a NON fair comparison. TransactionID is "very" selective. try using a time-based column and the scans for the "TOP" will be a lot less efficient.

    Sorry Noel, again an incorrect statement. It would seem that you may be a bit unclear on how the engine returns data when TOP/ORDER BY and a clustered index is in play for the ORDER BY. It will either seek or scan to the appropriate starting page and read data in page order until sufficient rows are read. The starting value (if filtered) or the fact that no filter is present is essentially irrelevant. Even disregarding the OP's statement "BIGINT is a high resolution timestamp" (which IMHO makes an identity on 113K rows a reasonable approximation of 20M+ rows with that design), a NON-specific value for a clustered index has essentially NO bearing on the cost and IO when TOP and ORDER BY are used.

    --Here the OrderDateKey represents a time series field that isn't very specific

    --lets make a bigger table this time too

    select *

    into #frs

    from (

    select SalesOrderNumber, OrderQuantity, SalesAmount, ProductKey, OrderDateKey

    from AdventureWorksDW.dbo.FactResellerSales

    union all

    select SalesOrderNumber, OrderQuantity, SalesAmount, ProductKey, OrderDateKey

    from AdventureWorksDW.dbo.FactResellerSales

    union all

    select SalesOrderNumber, OrderQuantity, SalesAmount, ProductKey, OrderDateKey

    from AdventureWorksDW.dbo.FactResellerSales

    union all

    select SalesOrderNumber, OrderQuantity, SalesAmount, ProductKey, OrderDateKey

    from AdventureWorksDW.dbo.FactResellerSales

    union all

    select SalesOrderNumber, OrderQuantity, SalesAmount, ProductKey, OrderDateKey

    from AdventureWorksDW.dbo.FactResellerSales

    union all

    select SalesOrderNumber, OrderQuantity, SalesAmount, ProductKey, OrderDateKey

    from AdventureWorksDW.dbo.FactResellerSales

    union all

    select SalesOrderNumber, OrderQuantity, SalesAmount, ProductKey, OrderDateKey

    from AdventureWorksDW.dbo.FactResellerSales) as t

    go

    --(425985 row(s) affected)

    backup log tempdb with truncate_only

    go

    create clustered index idx1 on #frs (OrderDateKey)

    go

    backup log tempdb with truncate_only

    go

    select OrderDateKey, count(*)

    from #frs

    group by OrderDateKey

    793 has 26320 rows out of 425985 --> very non-specific

    1067 has 21014 rows and is the highest key

    select top 100 *

    from #frs

    order by OrderDateKey desc

    0.0038661 cost, CI scan

    logical reads 4

    select top 100 *

    from #frs

    WHERE OrderDateKey = 1067 --lowest value, so equals first return

    order by OrderDateKey desc

    0.0038628, CI seek

    logical reads 3

    select top 100 *

    from #frs

    WHERE OrderDateKey = 793 --non-specific value

    order by OrderDateKey desc

    0.0038635, CI seek

    logical reads 3

    select top 100 *

    from #frs

    WHERE OrderDateKey between 700 and 800

    order by OrderDateKey desc

    0.0038662, CI seek

    logical reads 3

    NOTE: this is actually 0.0000001 LESS than the unfiltered query above, despite 1 IO less

    select top 100 *

    from #frs

    WHERE OrderDateKey between 1030 and 1035

    order by OrderDateKey desc

    14 rows only

    0.0032988

    logical reads 3

    even a very specific filter that results in < 100 total rows doesn't save anything

    on reads and only a tad on the cost

    drop table #frs

    go

    backup log tempdb with truncate_only

    go

    1ms excution time for all. Seek does save 25% IO, but it is ONE 8k page. I suppose you could call that "a lot less efficient" but I do not in the big picture of server activity. Also the less specific your time values are the less effective a range filter you proposed would become. 1007 to 1035 has a total of 14 rows matching the filter, but 793 to 793 has 26320 rows matching the filter.

    You could actually create a table with EVERY value the same for the clustered index and it would be the same result.

    select *

    into #frs

    from (

    select SalesOrderNumber, OrderQuantity, SalesAmount, ProductKey, 1 as OrderDateKey

    from AdventureWorksDW.dbo.FactResellerSales

    union all

    select SalesOrderNumber, OrderQuantity, SalesAmount, ProductKey, 1 as OrderDateKey

    from AdventureWorksDW.dbo.FactResellerSales

    union all

    select SalesOrderNumber, OrderQuantity, SalesAmount, ProductKey, 1 as OrderDateKey

    from AdventureWorksDW.dbo.FactResellerSales

    union all

    select SalesOrderNumber, OrderQuantity, SalesAmount, ProductKey, 1 as OrderDateKey

    from AdventureWorksDW.dbo.FactResellerSales

    union all

    select SalesOrderNumber, OrderQuantity, SalesAmount, ProductKey, 1 as OrderDateKey

    from AdventureWorksDW.dbo.FactResellerSales

    union all

    select SalesOrderNumber, OrderQuantity, SalesAmount, ProductKey, 1 as OrderDateKey

    from AdventureWorksDW.dbo.FactResellerSales

    union all

    select SalesOrderNumber, OrderQuantity, SalesAmount, ProductKey, 1 as OrderDateKey

    from AdventureWorksDW.dbo.FactResellerSales) as t

    go

    --(425985 row(s) affected)

    backup log tempdb with truncate_only

    go

    create clustered index idx1 on #frs (OrderDateKey)

    go

    backup log tempdb with truncate_only

    go

    select top 100 *

    from #frs

    order by OrderDateKey desc

    0.0038661 cost, CI scan

    logical reads 4

    select top 100 *

    from #frs

    WHERE OrderDateKey = 1

    order by OrderDateKey desc

    0.0038661 cost, CI seek

    logical reads 3

    drop table #frs

    go

    backup log tempdb with truncate_only

    go

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply