April 10, 2008 at 9:28 am
While I and I am sure others have benefitted from the deviation into how to efficiently sample tables with identities on them, that isn't addressing the needs of the OP.
After reviewing BOL here, ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/udb9/html/8868e8fd-6c42-4171-9eab-a0e38cb1bfd3.htm, I do believe the TABLESAMPLE option would be both effective and efficient to solve the OP's request. The engine generates a random value for each 8K page, which would require simply a reading of the allocation map - essentially instantanous even for extremely large tables. Then it would retrieve only those pages that are necessary to return the appropriate number of rows. The clustered index on the table is key here. It kind of guarantees a spread sample of the locations since location is the first column in the CI. Note that tablesample can also be made to be 'semi-repeatable' using the REPEATABLE clause. I say semi because changes to the table will affect output.
Can anyone who did some benchmarking with the identity data do a comparitive analysis of the use of tablesample? Test out small, medium and large numbers for both percentage and actual row counts please. I would be very grateful for the efforts and report!
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
Viewing post 31 (of 31 total)
You must be logged in to reply to this topic. Login to reply