Improving Performance of Cross-Partition Queries

  • Comments posted to this topic are about the item Improving Performance of Cross-Partition Queries

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • Have you tried changing the order of the clustered index. You say you did try this index:

    (IntDate ASC, CustomerID ASC, PartitionID ASC) ON[PartitionedTables](PartitionID)

    But have you tried putting the PartitionID fist? :

    (PartitionID ASC, IntDate ASC, CustomerID ASC) ON[PartitionedTables](PartitionID)

    I'm very curious what happens when you try this, so please post the result....

  • For SQL Server magazine subscribers, Itzik Ben-Gan has an article on exactly this issue at:

    http://www.sqlmag.com/article/microsoft-products/max-and-min-aggregates-against-partitioned-tables.aspx

    That was published March 16 2010.

    Anyway, about this article. I'm afraid I did not rate it highly for many reasons. One shortcoming is the lack of easily consumable sample code either in-line with the text or as a download.

    It would also have been good to mention that a non-partitioned index on a partitioned table does not suffer from this optimiser limitation:

    -- Everyone has one of these, feel free to use another test database

    USE tempdb;

    GO

    -- Example partition function and scheme

    CREATE PARTITION FUNCTION PF(TINYINT) AS RANGE RIGHT FOR VALUES (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10);

    CREATE PARTITION SCHEME PS AS PARTITION PF ALL TO ([PRIMARY]);

    GO

    -- Test table (partitioned heap)

    CREATE TABLE dbo.ExampleTable

    (

    [ExampleTableID] [bigint] NOT NULL IDENTITY(1, 1),

    [CustomerID] [int] NOT NULL,

    [PartitionID] [tinyint] NOT NULL,

    [IntDate] [int] NOT NULL,

    )

    ON PS (PartitionID);

    GO

    -- Partitioned clustered index

    -- (note: clustering key should always be unique)

    CREATE CLUSTERED INDEX [PCX dbo.ExampleTable IntDate, CustomerID]

    ON dbo.ExampleTable

    (IntDate, CustomerID)

    ON PS (PartitionID);

    GO

    -- Add 100,000 randomish rows

    INSERT dbo.ExampleTable WITH (TABLOCK)

    (CustomerID, PartitionID, IntDate)

    SELECT TOP (100000)

    RAND(CHECKSUM(NEWID())) * 1000 + 1,

    RAND(CHECKSUM(NEWID())) * 10 + 1,

    CONVERT(INTEGER, CONVERT(CHAR(8), DATEADD(DAY, 0 - RAND(CHECKSUM(NEWID())) * 365 + 1, GETDATE()), 112))

    FROM master.sys.columns C1,

    master.sys.columns C2,

    master.sys.columns C3;

    GO

    -- Add partitioned primary key

    ALTER TABLE dbo.ExampleTable

    ADD CONSTRAINT [PK dbo.ExampleTable ExampleTableID, CustomerID, PartitionID]

    PRIMARY KEY NONCLUSTERED (ExampleTableID, CustomerID, PartitionID) ON PS (PartitionID);

    GO

    -- Add partitioned index on CustomerID

    CREATE NONCLUSTERED INDEX [PIX dbo.ExampleTable CustomerID]

    ON dbo.ExampleTable (CustomerID)

    ON PS (PartitionID);

    GO

    -- Add non-partitioned index on IntDate

    CREATE NONCLUSTERED INDEX [IX dbo.ExampleTable IntDate]

    ON dbo.ExampleTable (IntDate)

    ON [PRIMARY];

    GO

    -- +++++++++++++++++++++++++++++++++++++++++++++

    -- Efficient plan based on non-partitioned index

    -- +++++++++++++++++++++++++++++++++++++++++++++

    SELECT minimum_date = MIN(IntDate)

    FROM dbo.ExampleTable;

    -- ++++++++++++++++++++++++++++++++++++++++++++++

    -- Alternative method to create an efficient plan

    -- based partly on Itzik Ben-Gan's approach

    -- +++++++++++++++++++++++++++++++++++++++++++++

    SELECT min_customer_id = MIN(iTVF.partition_minimum)

    FROM sys.partitions AS P

    CROSS

    APPLY (

    -- Minimum per partition

    SELECT partition_minimum = MIN(ET.CustomerID)

    FROM dbo.ExampleTable ET

    WHERE $PARTITION.PF(ET.PartitionID) = P.partition_number

    ) iTVF

    WHERE P.[object_id] = OBJECT_ID(N'dbo.ExampleTable', N'U')

    AND P.index_id IN (0, 1);

    GO

    -- Tidy up

    DROP TABLE dbo.ExampleTable;

    DROP PARTITION SCHEME PS;

    DROP PARTITION FUNCTION PF;

    Of course it is fair to say that the non-aligned index would have to be dropped and re-created to perform SWITCH operations. In mitigation, I would just say that this likely occurs in a maintenance window anyway, and indexes are pretty quick to build. One other option is to maintain both partitioned and non-partitioned indexes - the optimiser is smart enough to choose the efficient plan based on the non-partitioned index for MIN and MAX queries.

    Paul

  • Numerous SQL Server MVPs (and lots of other users as well) have been screaming at Microsoft for years to improve partitioning. I hope improvements will be coming in later versions of the engine. Connect has some entries for these things I think. It is a sin we need to jump through hoops for things like this!!

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • nice article

    we have a 2TB database we're going to be moving to new hardware soon and i wanted to try to partition it because people seem to like to select an entire year's worth of data or more in one query. but there is no way to test it since we don't have 2TB of empty space anywhere

  • Sander A. (5/4/2010)


    But have you tried putting the PartitionID fist? :

    (PartitionID ASC, IntDate ASC, CustomerID ASC) ON[PartitionedTables](PartitionID)

    Hello Sander. I did just try that with the same table and data that I used before and it performed basically the same as with the PartitionID at the end. Essentially 10 - 11 seconds for the generic/simple MIN(IntDate) query. The row order should be the same in either case since the values for PartitionID can only be of one value per partition.

    Take care,

    Solomon...

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • What happens if you use a sub query to get the minimum value for each partition, and then get the min of that?

    i.e. Something like this:

    SELECT MIN(MinIntDate)

    FROM (SELECT PartitionID, MIN(IntDate) MinIntDate

    FROM dbo.ExampleTablePartitioned

    GROUP BY PartitionID) a

    or

    SELECT MIN(MinIntDate)

    FROM (SELECT PartitionID FROM dbo.ExampleTable GROUP BY PartitionID) a

    CROSS APPLY (SELECT MIN(IntDate) MinIntDate

    FROM dbo.ExampleTable b

    WHERE B.PartitionID = a.PartitionID) c

  • Hi alen,

    Be careful before you decide to employ partitioning here. Do whatever you can do test your queries before you decide to implement. It could be a life-saver. 🙂

    I agree w/ you SQL Guru...I've spoke w/ the partitioning team @ MS. The idea for partitioning is not that it makes queries faster (believe it or not), but that it eases moving data in and out and along with some maintenance advantages.

    Tim

    SQL Server MVP

  • Paul White NZ (5/4/2010)


    It would also have been good to mention that a non-partitioned index on a partitioned table does not suffer from this optimiser limitation

    Paul, thanks for the feedback. Interesting about the Ben-Gan article; I did not know about it.

    And, for the sake of being thorough, you make a good point about the non-partitioned index. However, it is not that applicable in my situation. The index comes with two costs: disk space (in this case over 1 GB!) and it needs to be maintained across INSERT and DELETE operations. Given that the piece of code in my article is in about five Stored Procs, that is negligible disk space and still fairly maintainable. True, it does not help ad-hoc queries but we generally don't do those in Production, and if we do then rarely would they be across all partitions. I am sure others might find themselves in a similar situation.

    Take care,

    Solomon...

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • alen teplitsky (5/4/2010)


    nice article

    we have a 2TB database we're going to be moving to new hardware soon ... but there is no way to test it since we don't have 2TB of empty space anywhere

    Hello Alen and thanks. I am surprised that you cannot find 2 TB of available space given that I keep hearing how "disk is cheap" ;-). Good luck!

    Take care,

    Solomon...

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • UMG Developer (5/4/2010)


    What happens if you use a sub query to get the minimum value for each partition, and then get the min of that?

    SELECT MIN(MinIntDate)

    FROM (SELECT PartitionID, MIN(IntDate) MinIntDate

    FROM dbo.ExampleTablePartitioned

    GROUP BY PartitionID) a

    Hello. I just tried your code and it had the same effect as the regular SELECT MIN(IntDate). It took 10 - 11 seconds each time. There is really no getting around using the Partition Key in the WHERE clause (outside of doing a non-partitioned Index like Paul mentioned).

    Take care,

    Solomon...

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • Solomon Rutzky (5/4/2010)


    Hello. I just tried your code and it had the same effect as the regular SELECT MIN(IntDate). It took 10 - 11 seconds each time. There is really no getting around using the Partition Key in the WHERE clause (outside of doing a non-partitioned Index like Paul mentioned).

    What about the other version I edited in:

    SELECT MIN(MinIntDate)

    FROM (SELECT PartitionID FROM dbo.ExampleTable GROUP BY PartitionID) a

    CROSS APPLY (SELECT MIN(IntDate) MinIntDate

    FROM dbo.ExampleTable b

    WHERE B.PartitionID = a.PartitionID) c

    I would hope it was similar to the other cross apply that Paul supplied. (I guess the real problem there is that you would have to have an index with the PartitionID as the first column to get the best performance and using Sys.Partitions avoids that.)

  • UMG Developer (5/4/2010)


    I would hope it was similar to the other cross apply that Paul supplied. (I guess the real problem there is that you would have to have an index with the PartitionID as the first column to get the best performance and using Sys.Partitions avoids that.)

    Ah, sadly no - that APPLY results in two full scans of the partitioned index. You can use the test rig code to test for yourself - just don't create the non-partitioned index. You have to include the $PARTITION function to help the optimiser along.

  • Solomon Rutzky (5/4/2010)


    ...you make a good point about the non-partitioned index. However, it is not that applicable in my situation. The index comes with two costs: disk space (in this case over 1 GB!) and it needs to be maintained across INSERT and DELETE operations.

    I guess it comes down to circumstances and personal preference.

    Not looking to start a debate here, but I just want to respond briefly:

    The non-partitioned index could replace the partitioned index, so no additional space or maintenance would be required. I'm not sure the space or maintenance overhead arguments are terribly strong ones.

    Your precise circumstances may differ, but I think that in general, I would prefer the solution that just works every time, without having to rewrite queries, or remember to do the trick whenever new code is written containing MIN or MAX. I dislike the $PARTITION approach for similar reasons, but it does strike me as preferable to the WHILE loop + hard coded partition count.

    Paul

  • Solomon Rutzky (5/4/2010)


    alen teplitsky (5/4/2010)


    nice article

    we have a 2TB database we're going to be moving to new hardware soon ... but there is no way to test it since we don't have 2TB of empty space anywhere

    Hello Alen and thanks. I am surprised that you cannot find 2 TB of available space given that I keep hearing how "disk is cheap" ;-). Good luck!

    Take care,

    Solomon...

    its finding 2TB of usable space after you create separate RAID1 volumes for all the partitions, logs and tempdb

    $7500 or so to buy the disk and jbod for a soon to be free server

    $8000 to buy a new server with just enough disk

Viewing 15 posts - 1 through 15 (of 40 total)

You must be logged in to reply to this topic. Login to reply