|
|
|
Right there with Babe
      
Group: General Forum Members
Last Login: Yesterday @ 7:28 AM
Points: 795,
Visits: 1,984
|
|
Neil Franken (3/1/2010) Hi There
Cool article but here is what I have found with newid().
...
Right lets say the profile(salary) matches 1,5 million rows in the database. SQL server will return all 1,5 million rows then ORDER BY. Now I mentioned that a GUID is very very random. This causes a high cpu load on the server as the poor Server now has to first sort the 1,5 million rows then it can return the top 1000. Think about it. It first has to sort before it can return. I have tested this and it does not matter if I return 1 row or 750 000 out of the 1,5 million rows that matches the query it constantly runs at the same speed. The top can only be applied once the sorting is done. Granted for small tables and non mission critical queries this technique can work well I would not use it on large tables as you will create a bottle neck.
...
One option, especially if you have an indexed identity column on your source table, is to generate a separate table of random row numbers, create a clustered index on it, and join with the original table.
create table #lookup_table(row_num int) declare @ctr int, @samplesize int set @ctr = 0 set @samplesize = 1000 -- for example, a sample size of 1,000 is needed while @ctr < @samplesize BEGIN insert into #lookup_table select abs(checksum(newid())) set @ctr = @ctr + 1 END create clustered index idxc on #lookup_table(row_num)
Do a join on this table and it should go much more quickly, so the entire original table would not be loaded. Not much chance there will be duplicate rows with this method as INT can be up to 2,147,483,647.
Gaby ________________________________________________________________ "In theory, theory and practice are the same. In practice, they are not." - Albert Einstein
|
|
|
|
|
SSChampion
        
Group: General Forum Members
Last Login: Today @ 1:49 AM
Points: 10,990,
Visits: 10,540
|
|
adish (3/1/2010) I've been using GUID as PKs, but this is a novel way of using it. Great. Not a clustered Primary Key, I trust?
Paul White SQL Server MVP SQLblog.com @SQL_Kiwi
|
|
|
|
|
SSChampion
        
Group: General Forum Members
Last Login: Today @ 1:49 AM
Points: 10,990,
Visits: 10,540
|
|
GabyYYZ (3/1/2010) [quote]One option, especially if you have an indexed identity column on your source table, is to generate a separate table of random row numbers, create a clustered index on it, and join with the original table. Nice idea. Of course, the 'random' numbers are then a bit, er, 'fixed' aren't they? Can't believe you used a RBAR method to populate your table.  For smallish numbers of random rows, I prefer an approach very similar to the one posted by Gary earlier. It does require a table with a sequential ID, but that's pretty common - excepting those that like GUIDs as a PK *shudder*
Paul White SQL Server MVP SQLblog.com @SQL_Kiwi
|
|
|
|
|
SSCommitted
      
Group: General Forum Members
Last Login: Sunday, May 12, 2013 4:26 PM
Points: 1,696,
Visits: 1,742
|
|
|
|
|
|
SSChasing Mays
      
Group: General Forum Members
Last Login: Monday, February 18, 2013 3:22 PM
Points: 626,
Visits: 835
|
|
Paul White (3/1/2010)
adish (3/1/2010) I've been using GUID as PKs, but this is a novel way of using it. Great.Not a clustered Primary Key, I trust?
obviously the super-best data type for a PK is one that increments by 1 each time so the index is appended on each insert and there is no fragmentation,
Any other data type is just as likely to fragment as a guid. I left-handedly proved this in one of my shamefully RBAR riddled articles: http://www.sqlservercentral.com/articles/Indexing/64424/
It's the random insertion, not the datatype that causes the problem. What's the differece if the data inserted is Johnson, Jonsonn, Johnsen or three guids? Under the hood, there's not a difference.
The only way you're not going to frag your index at an equal rate as a guid is if the data is inserted in the same order as the clustered index key. In OLTP, that's pretty unlikely except with and identity (which has it's own set of issues).
~Craig O.
Craig Outcalt
Tips for new DBAs: http://www.sqlservercentral.com/articles/Career/64632 My other articles: http://www.sqlservercentral.com/Authors/Articles/Craig_Outcalt/560258
|
|
|
|
|
SSChasing Mays
      
Group: General Forum Members
Last Login: Monday, February 18, 2013 3:22 PM
Points: 626,
Visits: 835
|
|
|
|
|
|
SSChampion
        
Group: General Forum Members
Last Login: Today @ 1:49 AM
Points: 10,990,
Visits: 10,540
|
|
Jonathan Kehayias (3/1/2010) It probably was clustered, its common for App Developers to do this kind of thing. It happened at Microsoft around the Windows 7 RC downloads... Ah yes, I remember that one  Highly amusing at the time...
Paul White SQL Server MVP SQLblog.com @SQL_Kiwi
|
|
|
|
|
SSCommitted
      
Group: General Forum Members
Last Login: Sunday, May 12, 2013 4:26 PM
Points: 1,696,
Visits: 1,742
|
|
SQLBOT (3/1/2010)
It's the random insertion, not the datatype that causes the problem. What's the differece if the data inserted is Johnson, Jonsonn, Johnsen or three guids? Under the hood, there's not a difference.
Craig,
Respectfully, the rate of fragmenation partially depends on what the datatype is for the column. If you are inserting random values into a varchar(8) column, the end impact for fragmentation would be different than a char(8), nchar(8) or nvarchar(8) column because the storage size is different for each so fragmentation rates would be different. A GUID is 16 bytes so it takes more space = fuller pages faster = more page splits = faster fragmentation rates.
Your point is accurate, just playing semantics with you is all.
Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008 My Blog | Twitter | MVP Profile Training | Consulting | Become a SQLskills Insider Troubleshooting SQL Server: A Guide for Accidental DBAs
|
|
|
|
|
SSC Veteran
      
Group: General Forum Members
Last Login: Today @ 2:32 AM
Points: 228,
Visits: 398
|
|
Paul White (3/1/2010)
Jonathan Kehayias (3/1/2010) It probably was clustered, its common for App Developers to do this kind of thing. It happened at Microsoft around the Windows 7 RC downloads...Ah yes, I remember that one  Highly amusing at the time...
It is clustered. Am I doing something terribly wrong? Some light on this please, I am not an expert in SQL and would appreciate guidance from SQL gurus. Thanks
|
|
|
|
|
SSCommitted
      
Group: General Forum Members
Last Login: Sunday, May 12, 2013 4:26 PM
Points: 1,696,
Visits: 1,742
|
|
|
|
|