Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

TABLESAMPLE Expand / Collapse
Author
Message
Posted Monday, June 30, 2008 9:29 PM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Tuesday, December 16, 2014 8:26 AM
Points: 2,716, Visits: 2,484
Comments posted to this topic are about the item TABLESAMPLE
Post #526304
Posted Tuesday, July 1, 2008 1:03 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Sunday, May 10, 2009 10:10 AM
Points: 33, Visits: 30
The tablesample returns a random number every time as the seed used to generate the random number varies marginally everytime, eventhough we say tablesample(10 percent) it will reutrn +1000 or -1000 rows approximatly,

example : 1. select * from results TABLESAMPLE SYSTEM(100 ROWS)
2. select * from results TABLESAMPLE SYSTEM(10 PERCENT)

The number of rows will vary everytime. To get a constant number of rows everytime we can use the keyword REPEATABLE.
Post #526346
Posted Tuesday, July 1, 2008 1:05 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Sunday, May 10, 2009 10:10 AM
Points: 33, Visits: 30
The tablesample returns a random number every time as the seed used to generate the random number varies marginally everytime, eventhough we say tablesample(10 percent) it will reutrn +1000 or -1000 rows approximatly,

example : 1. select * from results TABLESAMPLE SYSTEM(100 ROWS)
2. select * from results TABLESAMPLE SYSTEM(10 PERCENT)

The number of rows will vary everytime. To get a constant number of rows everytime we can use the keyword REPEATABLE.

select * from results TABLESAMPLE SYSTEM(10 PERCENT) REPEATABLE(1000).

Given the same seed, you will get the same rows back. One thing to note here: this is not like the
repeatable read isolation level. If another user makes changes to the data in the table, you will not get back the exact same rows. It is only true for a given “version” of the table.
Post #526347
Posted Tuesday, July 1, 2008 2:06 AM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Tuesday, December 16, 2014 8:26 AM
Points: 2,716, Visits: 2,484
just to clarify, it's not that a random number of rows is returned, a random value is generated for each page of the table, and then this is used with the percentage to generate the sample

From http://technet.microsoft.com/en-us/library/ms189108.aspx
TABLESAMPLE SYSTEM returns an approximate percentage of rows and generates a random value for each physical 8-KB page in the table. Based on the random value for a page and the percentage specified in the query, a page is either included in the sample or excluded. Each page that is included returns all rows in the sample result set. For example, if you specify TABLESAMPLE SYSTEM 10 PERCENT, SQL Server returns all the rows on approximately 10 percent of the specified data pages of the table. If the rows are evenly distributed on the pages of the table, and if there is a sufficient number of pages in the table, the number of rows returned should approximate the sample size that is requested. However, because the random value that is generated for each page is independent of the values that are generated for any other page, a larger, or smaller, percentage of pages than have been requested might be returned.
Post #526383
Posted Tuesday, July 1, 2008 2:48 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Wednesday, September 24, 2008 1:27 PM
Points: 350, Visits: 229
I got this right, but the explanation, to me, doesn't really sound right:

"TABLESAMPLE returns an approximate percentage of rows, even if a number of rows is specified. This is used to get a sample of data from large rows and does not guarentee a number of rows or a random sample"

From http://technet.microsoft.com/en-us/library/ms189108.aspx, it says TABLESAMPLE returns a sample number of rows from the result set, so if 10 percent is used, then 10 percent is returned. Quote from the link says:
The TABLESAMPLE clause limits the number of rows returned from a table in the FROM clause to a sample number or PERCENT of rows.


However, if SYSTEM is used, then if 10 percent is specified, then around 10 percent of the result set is returned. Quote from the link says:
TABLESAMPLE SYSTEM returns an approximate percentage of rows and generates a random value for each physical 8-KB page in the table.....For example, if you specify TABLESAMPLE SYSTEM 10 PERCENT, SQL Server returns all the rows on approximately 10 percent of the specified data pages of the table.....However, because the random value that is generated for each page is independent of the values that are generated for any other page, a larger, or smaller, percentage of pages than have been requested might be returned.


Am I reading that right?
Post #526410
Posted Tuesday, July 1, 2008 3:17 AM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Tuesday, December 16, 2014 8:26 AM
Points: 2,716, Visits: 2,484
It doesn't matter whether you specify SYSTEM or not

From the same link

SYSTEM specifies an ANSI SQL implementation-dependent sampling method. Specifying SYSTEM is optional, but this option is the only sampling method available in SQL Server 2005 and is applied by default.
Post #526421
Posted Tuesday, July 1, 2008 4:13 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Wednesday, September 24, 2008 1:27 PM
Points: 350, Visits: 229
kevriley (7/1/2008)
It doesn't matter whether you specify SYSTEM or not

From the same link

SYSTEM specifies an ANSI SQL implementation-dependent sampling method. Specifying SYSTEM is optional, but this option is the only sampling method available in SQL Server 2005 and is applied by default.


Oh right ok, so it doesn't matter then if you specify SYSTEM or not.
Post #526446
Posted Tuesday, July 1, 2008 10:43 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: Administrators
Last Login: Today @ 1:03 PM
Points: 31,365, Visits: 15,829
It's likely you would get 10% back, but there's no guarantee.






Follow me on Twitter: @way0utwest

Forum Etiquette: How to post data/code on a forum to get the best help
Post #526787
Posted Tuesday, July 1, 2008 11:54 AM
Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Thursday, July 17, 2014 10:56 AM
Points: 3,924, Visits: 1,607
Steve Jones - Editor (7/1/2008)
It's likely you would get 10% back, but there's no guarantee.


I tested it more than 10 times and it never returned 10% back approx. It mostly got 476 rows effected or 1454 or 1904 rows.

But, I doubt, does any body uses this in day to day programming? If no, what's the actual use of this function -- "tablesample system".


SQL DBA.
Post #526840
Posted Tuesday, July 1, 2008 3:49 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, December 12, 2014 5:45 AM
Points: 191, Visits: 191
It's used whenever you need to quickly get a representative sample of data from a large data set for testing purposes.

It's especially useful when you're developing data warehouses as you can develop and test quickly without having to process a very large database every time you make an amendment to a report or cube.
Post #526939
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse