Table partitioning and MAX queries performance problem

  • Hi All,

    Just implemented partitioning in SQL Server 2005. Which should be a great tool for better flexibility and performance. However I'm pretty disappointed running into performance problems while running simple queries like these:

    select max(id) from tbl_test

    If the table contains 50 miljons of rows. An index scan of 50 miljon rows occurs despite the existence of the index. When using an unpartitioned table an index seek is used as one would expect.

    Then I found out this bug is also confirmed by Microsoft:

    http://social.msdn.microsoft.com/forums/en-US/sqldatabaseengine/thread/ac2f13d7-875a-4f0c-b119-c654e4990ea7/

    I'm pretty disappointed in Microsoft especially the fact they are not able to solve this problem.

    I was wondering if some of you also ran into this problem and what workarounds you have used.

    Best Regards, Peter (SQL Server DBA)

    __________________
    MS-SQL / SSIS / SSRS junkie
    Visit my blog at dba60k.net

  • peterjonk (3/26/2010)


    select max(id) from tbl_test

    first of all this query never go for index seek as it doesnt contain any where clause

    and second, if this table has only one column (id) then above query should go for index scan

    and if it has more then one column it will go for table scan for these cases i am assuming that we have non clustered index on id column

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • Please read the link I posted. A max query does in fact use an index seek if the column is a primary key or indexed.

    The query I provided was just to use a sample. In the real world the table has a clustered partitioned index. There is huge difference between the execution plans of a partitioned table (Index scan) and an unpartitioned table (Index seek).

    __________________
    MS-SQL / SSIS / SSRS junkie
    Visit my blog at dba60k.net

  • Have you tried a query like this?:

    SELECT MAX(CAST(LTRIM(CAST(subID AS VARCHAR(MAX))) AS bigint)) AS id

    FROM

    (

    SELECT

    $PARTITION.YourPatitionFunction(YourPartitionColumn) AS [partitionNum]

    , MAX(id) AS [subID]

    FROM tbl_test

    GROUP BY $PARTITION.YourPatitionFunction(YourPartitionColumn)

    ) AS a

    I don't have partitioned tables to test this against, so it may have more opaqueness in the outer MAX than it needs.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Thanks for you reply and suggestion. It seems that rewriting the queries is the only way to work around this problem.

    Some of our .NET applications using ad-hoc queries (via OR mappers) or SP's to execute such queries receive time-outs. So we decided to rollback partitioning because we first need to investigate the impact of this problem. That's very time consuming if you have dozens of applications and stored procedures which need to be reviewed for query rewriting.

    The problem is confirmed by Microsoft and still exists in SQL 2008

    You've provided a very nice analysis that has shown several limitations in the optimizer related to queries on partitioned tables that use MIN and TOP. We're not in a position to fix these issues as a bug fix for SQL Server 2008 because they require some significant effort.

    A very disappointing answer from Microsoft in my opinion.

    This is the workaround I used which includes an extra where clause to limit the number of rows which must be scanned:

    declare @numrows int

    declare @return_value int

    -- Get number of rows in table

    SELECT @numrows=sum(p.rows) FROM sys.partitions p

    inner join sys.indexes i on p.object_id=i.object_id and p.index_id=i.index_id

    WHERE p.object_id=OBJECT_ID('TBL_TEST')

    and name='PK_TBL_TEST_IDENTITY_ID'

    -- Get max value of identity column

    select max(IDENTITY_ID) from TBL_TEST

    where IDENTITY_ID >= @numrows

    __________________
    MS-SQL / SSIS / SSRS junkie
    Visit my blog at dba60k.net

  • Ah, nice idea. I like that one.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • this has been improved in 2008. Seek happens at partition level and not at total table level.[Partition-Aware Seek Operation]

  • vidya_pande (3/27/2010)


    this has been improved in 2008. Seek happens at partition level and not at total table level.[Partition-Aware Seek Operation]

    Doesn't help at all. Read the Connect item linked to previously in this thread.

  • This is an annoying issue. I used to work around this by creating an extra non-partitioned index on the partitioned table.

    The extra index had to be dropped before, and re-created after, partition SWITCHing, of course - but partitions typically aren't switched around very frequently, and usually during a maintenance window.

    Slightly inconvenient, but worth it for the other benefits of partitioning.

  • This was removed by the editor as SPAM

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply