|
|
|
SSC-Addicted
      
Group: General Forum Members
Last Login: Thursday, November 13, 2008 9:13 AM
Points: 499,
Visits: 76
|
|
| Why would anyone have a clustered index on an NVARCHAR(341) column? Just curious because this is a main part of the problem to begin with. Hopefully this does not get inserted by a user on a table this size.
|
|
|
|
|
SSC-Enthusiastic
      
Group: General Forum Members
Last Login: Friday, April 26, 2013 6:18 AM
Points: 129,
Visits: 205
|
|
If DBA is not in total control of table indexes then ....... I can understand that dba is not always in control with database design but nobody tells dba where and how to use indexes but dba. It means that smart way to do the task was 1. analyze indexes 2. use standard way for the update.
|
|
|
|
|
SSC-Enthusiastic
      
Group: General Forum Members
Last Login: Monday, December 17, 2012 3:53 AM
Points: 135,
Visits: 77
|
|
Remeber very carefully. Don't ever set a default value when you add a new column to a table with such a large number of rows. Becasue it is no different to addding the column as null and then issue one transaction to update all the 35 million records with the default value. It's the worst way of doing this. That'll load the whole server and block the table for a very long time as well.
|
|
|
|
|
SSC-Enthusiastic
      
Group: General Forum Members
Last Login: Monday, December 17, 2012 3:53 AM
Points: 135,
Visits: 77
|
|
Yes, I agree with you guys. the Design is crap here, but my solution is for people who cannot change the whole design to add a new column to a table like this and to explain how the best way to traverse through a table to do this type of an update. I am sure there are lot of people who were trying to do this type of thing and waited so long it to finish saying to themlesves "This is a very large table, no wonder it takes so long". This will help them to understand the underline operation bit better and approach it more efficiently next time. And others who are designing the tables, now you know the troubles ahead if you didn't design it properly at the first time.
|
|
|
|
|
SSC-Enthusiastic
      
Group: General Forum Members
Last Login: Monday, December 17, 2012 3:53 AM
Points: 135,
Visits: 77
|
|
OK, firstly sorry if my writing is bit confusing. i'll try to clear it up with this explanation. My first example which updates the newzip column, is actually to demonstrate the method widely suggested to follow when you add a new column to a very large table. That's just an example, and i first used that method to do the update on my own table called [LargeTable] and that's the one took 20+ hours. And i haven't posted it in the article because the provided example (with NewZip column) is good enough to understand what it does. But for the solution i am suggesting, i have given the exact code i ran on my table [LargeTable]. Hope this clears the confusion.
|
|
|
|
|
Old Hand
      
Group: General Forum Members
Last Login: Thursday, May 02, 2013 2:13 AM
Points: 362,
Visits: 324
|
|
I was going to keep out of this until the statement about the design being crap. The clustered index is there - on that specific column - because it was designed that way for the critical operations on that DB. The performance-critical activities are searches on data, where the search criteria (usually) translate quite nicely into range searches on the column implemented by the clustered index. There is no other column on that table which would give the slightest benefit in a range search. As to length... yes, that does look a bit silly. It is in fact accommodating the largest value that could be constructed as the search key without truncation. Does it really need to be that big? No. Nearly 98% of values are 25 characters or less, and the maximum is 56. As to inserting / page splits. Yep, the effect would be rather worse than if (say) the identity column was used, but (a) the column is never updated, (b) inserts are never done online and (c) the usual insert rate is very low (well, there is one occasional batch process that's a bit heavy, and a bit of a dog, but that's a separate thread entirely). What are my motives in responding? Well, yes, I had some involvement in this design, but more important is the principle. Don't automatically assume character columns are bad in clustered indexes. A proper design needs to understand ALL the intended uses of the table, and make indexing decisions accordingly. An interesting article from Chris Hedgate is at: http://www.sqlservercentral.com/columnists/chedgate/clusterthatindex.asp
|
|
|
|
|
SSCoach
         
Group: General Forum Members
Last Login: Tuesday, May 21, 2013 1:55 PM
Points: 15,442,
Visits: 9,571
|
|
I have to agree with SDM. Just because, in general, a clustered index on an alpha field (char, nchar, varchar, nvarchar) is usually a bad idea, doesn't mean it's always a bad idea. Keep in mind that these things are guidelines, not laws.
A simple example is a list of US states. I have the clustered index (and primary key) on the 2-letter abbreviation in my databases. Why wouldn't I?
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
|
|
|
|
|
Grasshopper
      
Group: General Forum Members
Last Login: Saturday, November 14, 2009 5:07 PM
Points: 15,
Visits: 41
|
|
I saw the range table creation script and think that we can create it with another twist by utilizing Izik-Ben-Gan "Sequential Number table function" without using loop.
Result expected
RangeStart RangeEnd AA AB AB AC ....
The final solution be like one line of query.
WITH L0 AS ( SELECT CHAR(n) AS Alphabet FROM dbo.[fn_nums](90) WHERE n >=65) SELECT A.Alphabet + B.Alphabet AS RangeStart, A.Alphabet + char(ascii(B.Alphabet)+1) AS RangeEnd FROM L0 AS A, L0 AS B WHERE A.Alphabet + B.Alphabet <> 'ZZ' ORDER BY A.Alphabet + B.Alphabet
Basically the first CTE
L0 AS (SELECT CHAR(n) AS Alphabet FROM dbo.[fn_nums](90) WHERE n >=65)
Will give us 'A' to 'Z' and then we cross join itself with the second column is the rangeEnd with second alphabet is the next alphabet of the first column. (It's easier just try to match the result to the query)
But before that can run we'll need to create THE function
CREATE FUNCTION [dbo].[fn_nums](@n AS BIGINT) RETURNS TABLE AS RETURN WITH L0 AS(SELECT 1 AS c UNION ALL SELECT 1), L1 AS(SELECT 1 AS c FROM L0 AS A, L0 AS B), L2 AS(SELECT 1 AS c FROM L1 AS A, L1 AS B), L3 AS(SELECT 1 AS c FROM L2 AS A, L2 AS B), L4 AS(SELECT 1 AS c FROM L3 AS A, L3 AS B), L5 AS(SELECT 1 AS c FROM L4 AS A, L4 AS B), Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY c) AS n FROM L5) SELECT n FROM Nums WHERE n <= @n;
This function will return a table result with a single column n and start from 1 to the parameter @n super fast. Very useful function check out Inside TSQL querying book by Izik Ben Gan.
James
|
|
|
|
|
SSC-Enthusiastic
      
Group: General Forum Members
Last Login: Monday, December 17, 2012 3:53 AM
Points: 135,
Visits: 77
|
|
GSquared (4/7/2008) I have to agree with SDM. Just because, in general, a clustered index on an alpha field (char, nchar, varchar, nvarchar) is usually a bad idea, doesn't mean it's always a bad idea. Keep in mind that these things are guidelines, not laws.
A simple example is a list of US states. I have the clustered index (and primary key) on the 2-letter abbreviation in my databases. Why wouldn't I?
I agree. But please note that I meant to say NVARCHAR(341) is not a good choice for a clustered key and I was talking about very large table having around 30-40 million rows where defragmentation and page splits (reorganizing of pages) can be a real issue.
|
|
|
|