Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


TABLESAMPLE


TABLESAMPLE

Author
Message
kevriley
kevriley
SSCrazy
SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)

Group: General Forum Members
Points: 2751 Visits: 2606
Comments posted to this topic are about the item TABLESAMPLE
er.soundararajan-800293
er.soundararajan-800293
SSC Rookie
SSC Rookie (33 reputation)SSC Rookie (33 reputation)SSC Rookie (33 reputation)SSC Rookie (33 reputation)SSC Rookie (33 reputation)SSC Rookie (33 reputation)SSC Rookie (33 reputation)SSC Rookie (33 reputation)

Group: General Forum Members
Points: 33 Visits: 30
The tablesample returns a random number every time as the seed used to generate the random number varies marginally everytime, eventhough we say tablesample(10 percent) it will reutrn +1000 or -1000 rows approximatly,

example : 1. select * from results TABLESAMPLE SYSTEM(100 ROWS)
2. select * from results TABLESAMPLE SYSTEM(10 PERCENT)

The number of rows will vary everytime. To get a constant number of rows everytime we can use the keyword REPEATABLE.
er.soundararajan-800293
er.soundararajan-800293
SSC Rookie
SSC Rookie (33 reputation)SSC Rookie (33 reputation)SSC Rookie (33 reputation)SSC Rookie (33 reputation)SSC Rookie (33 reputation)SSC Rookie (33 reputation)SSC Rookie (33 reputation)SSC Rookie (33 reputation)

Group: General Forum Members
Points: 33 Visits: 30
The tablesample returns a random number every time as the seed used to generate the random number varies marginally everytime, eventhough we say tablesample(10 percent) it will reutrn +1000 or -1000 rows approximatly,

example : 1. select * from results TABLESAMPLE SYSTEM(100 ROWS)
2. select * from results TABLESAMPLE SYSTEM(10 PERCENT)

The number of rows will vary everytime. To get a constant number of rows everytime we can use the keyword REPEATABLE.

select * from results TABLESAMPLE SYSTEM(10 PERCENT) REPEATABLE(1000).

Given the same seed, you will get the same rows back. One thing to note here: this is not like the
repeatable read isolation level. If another user makes changes to the data in the table, you will not get back the exact same rows. It is only true for a given “version” of the table.
kevriley
kevriley
SSCrazy
SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)

Group: General Forum Members
Points: 2751 Visits: 2606
just to clarify, it's not that a random number of rows is returned, a random value is generated for each page of the table, and then this is used with the percentage to generate the sample

From http://technet.microsoft.com/en-us/library/ms189108.aspx
TABLESAMPLE SYSTEM returns an approximate percentage of rows and generates a random value for each physical 8-KB page in the table. Based on the random value for a page and the percentage specified in the query, a page is either included in the sample or excluded. Each page that is included returns all rows in the sample result set. For example, if you specify TABLESAMPLE SYSTEM 10 PERCENT, SQL Server returns all the rows on approximately 10 percent of the specified data pages of the table. If the rows are evenly distributed on the pages of the table, and if there is a sufficient number of pages in the table, the number of rows returned should approximate the sample size that is requested. However, because the random value that is generated for each page is independent of the values that are generated for any other page, a larger, or smaller, percentage of pages than have been requested might be returned.

skyline666
skyline666
Old Hand
Old Hand (350 reputation)Old Hand (350 reputation)Old Hand (350 reputation)Old Hand (350 reputation)Old Hand (350 reputation)Old Hand (350 reputation)Old Hand (350 reputation)Old Hand (350 reputation)

Group: General Forum Members
Points: 350 Visits: 229
I got this right, but the explanation, to me, doesn't really sound right:

"TABLESAMPLE returns an approximate percentage of rows, even if a number of rows is specified. This is used to get a sample of data from large rows and does not guarentee a number of rows or a random sample"

From http://technet.microsoft.com/en-us/library/ms189108.aspx, it says TABLESAMPLE returns a sample number of rows from the result set, so if 10 percent is used, then 10 percent is returned. Quote from the link says:
The TABLESAMPLE clause limits the number of rows returned from a table in the FROM clause to a sample number or PERCENT of rows.


However, if SYSTEM is used, then if 10 percent is specified, then around 10 percent of the result set is returned. Quote from the link says:
TABLESAMPLE SYSTEM returns an approximate percentage of rows and generates a random value for each physical 8-KB page in the table.....For example, if you specify TABLESAMPLE SYSTEM 10 PERCENT, SQL Server returns all the rows on approximately 10 percent of the specified data pages of the table.....However, because the random value that is generated for each page is independent of the values that are generated for any other page, a larger, or smaller, percentage of pages than have been requested might be returned.


Am I reading that right?
kevriley
kevriley
SSCrazy
SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)

Group: General Forum Members
Points: 2751 Visits: 2606
It doesn't matter whether you specify SYSTEM or not

From the same link

SYSTEM specifies an ANSI SQL implementation-dependent sampling method. Specifying SYSTEM is optional, but this option is the only sampling method available in SQL Server 2005 and is applied by default.

skyline666
skyline666
Old Hand
Old Hand (350 reputation)Old Hand (350 reputation)Old Hand (350 reputation)Old Hand (350 reputation)Old Hand (350 reputation)Old Hand (350 reputation)Old Hand (350 reputation)Old Hand (350 reputation)

Group: General Forum Members
Points: 350 Visits: 229
kevriley (7/1/2008)
It doesn't matter whether you specify SYSTEM or not

From the same link

SYSTEM specifies an ANSI SQL implementation-dependent sampling method. Specifying SYSTEM is optional, but this option is the only sampling method available in SQL Server 2005 and is applied by default.


Oh right ok, so it doesn't matter then if you specify SYSTEM or not.
Steve Jones
Steve Jones
SSC-Dedicated
SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)

Group: Administrators
Points: 35894 Visits: 18721
It's likely you would get 10% back, but there's no guarantee.

Follow me on Twitter: @way0utwestForum Etiquette: How to post data/code on a forum to get the best help
SanjayAttray
SanjayAttray
Hall of Fame
Hall of Fame (4K reputation)Hall of Fame (4K reputation)Hall of Fame (4K reputation)Hall of Fame (4K reputation)Hall of Fame (4K reputation)Hall of Fame (4K reputation)Hall of Fame (4K reputation)Hall of Fame (4K reputation)

Group: General Forum Members
Points: 3953 Visits: 1619
Steve Jones - Editor (7/1/2008)
It's likely you would get 10% back, but there's no guarantee.


I tested it more than 10 times and it never returned 10% back approx. It mostly got 476 rows effected or 1454 or 1904 rows.

But, I doubt, does any body uses this in day to day programming? If no, what's the actual use of this function -- "tablesample system".

SQL DBA.
seanoregan
seanoregan
SSC-Enthusiastic
SSC-Enthusiastic (191 reputation)SSC-Enthusiastic (191 reputation)SSC-Enthusiastic (191 reputation)SSC-Enthusiastic (191 reputation)SSC-Enthusiastic (191 reputation)SSC-Enthusiastic (191 reputation)SSC-Enthusiastic (191 reputation)SSC-Enthusiastic (191 reputation)

Group: General Forum Members
Points: 191 Visits: 207
It's used whenever you need to quickly get a representative sample of data from a large data set for testing purposes.

It's especially useful when you're developing data warehouses as you can develop and test quickly without having to process a very large database every time you make an amendment to a report or cube.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search