Recently I started a new piece of work moving legacy data into a newly developed bespoke CRM system. I solemnly promised not to criticise the already designed half-implemented new database and concentrate solely on migrating the data, but some things just can’t be overlooked. I noticed an NTEXT field in the first table I was trying to pipe data into and a quick check of the INFORMATION_SCHEMA data showed that there were 83 of these in the new database schema.
The NTEXT data type has been marked for deprecation since the release of SQL Server 2005 along with the other Large Object data types TEXT and IMAGE. To quote Books Online’s doleful message: “Avoid using these data types in new development work, and plan to modify applications that currently use them. Use nvarchar(max), varchar(max), and varbinary(max) instead.”
I had a quick chat with the application developer and we agreed that these columns should all be converted; I thought it would be a reasonably involved task, but at the end of that day I gave it a few minutes thought and found that a command such as:
ALTER TABLE Client.Enquiry ALTER COLUMN EnquiryDescription varchar(max);
This was all that was needed. SQL Server quite happily converted the column without the table needing to be dropped/recreated even though it contained data. I also realised that I could get a list of all the offending columns by querying the INFORMATION_SCHEMA data. The following code takes advantage of this to build a batch of SQL commands stored in the @convert variable (83 of them in this case) using what is hopefully a familiar method of string concatenation that avoids the use of a cursor - the batch is then executed in the final line:
USE MyDatabase GO DECLARE @convert varchar(max) SET @convert = '' SELECT @convert = @convert + 'ALTER TABLE [' + t.TABLE_SCHEMA + '].[' + t.TABLE_NAME + '] ALTER COLUMN [' + c.COLUMN_NAME + '] ' + CASE c.DATA_TYPE WHEN 'text' THEN 'varchar(max)' WHEN 'ntext' THEN 'nvarchar(max)' WHEN 'image' THEN 'varbinary(max)' END + ';' + CHAR(13) + CHAR(10) FROM INFORMATION_SCHEMA.TABLES t JOIN INFORMATION_SCHEMA.COLUMNS c ON c.TABLE_NAME = t.TABLE_NAME WHERE t.TABLE_TYPE = 'BASE TABLE' AND c.DATA_TYPE IN('text','ntext','image') EXEC (@convert)
Note that along the way I had realised that, although the only large object type in use in the current database was NTEXT, it was a simple matter to cater for the other types so that I ended up with a generic piece of code that will work with any database on SQL Server 2005+, hence the reason for sharing.
I sat back revelling in what could be achieved in 10 minutes with a little knowledge and experience - but then a couple of points occurred to me that ought to be looked into.
Firstly, I ran:
SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE DATA_TYPE IN('text','ntext','image')
I expected to be met with the resounding success of 0 rows returned, but to my horror found that all views that referred to the offending columns were still listed as using the old-style data type. This must be a minor issue but wishing to be thorough I decided to eradicate these by calling sp_refreshview on each view still listed.
Secondly, it’s all very well to simply convert the data types and move on but what about the supposed performance benefits of using the new data types. I had read that NVARCHAR(MAX) was superior to NTEXT not only because it wasn’t being deprecated but also because up to 8kb per row can be stored “in row” by SQL Server. This means they can be stored alongside the other column values as opposed to the old-style data types, which always store a pointer to another page that holds the data. In this situation disk reads require more effort to retrieve the data. When SQL Server converts the NTEXT columns to NVARCHAR(MAX) columns it leaves the data exactly as it was before, and it is not until data is added or updated that it takes advantage of the new layout. It turns out that the easy way to achieve this is simply to issue a statement like:
UPDATE [Client].[Enquiry] SET [EnquiryDescription] = [EnquiryDescription]
A brief proof of this seems reasonable. Run the following:
CREATE DATABASE LOB GO USE LOB GO CREATE TABLE HasLob( Id int identity(1,1) primary key, LobData ntext ) INSERT HasLob(LobData) SELECT REPLICATE('cat', 50) DBCC IND(LOB, HasLob, -1)
Check the output of DBCC IND for PageTypes 1 (data) and 3 (LOB data), noting the PagePID values for those rows and use those same numbers in the following uses of DBCC Page. I found pages 78 and 55 respectively; you will need to substitute your own values. DBCC Page is easily investigated elsewhere on the net; the parameters are the database name (or Id), the file number (the value of PageFID from DBCC IND, but always 1 unless another data file has been created), the page number, and finally a print option which dictates the format of the output. I am using option 3 here to get the detailed per-row interpretation.
DBCC TRACEON(3604) -- to enable output from DBCC Page DBCC Page(LOB, 1, 78, 3) DBCC Page(LOB, 1, 55, 3)
Looking at the output for page 78 I see that the data is not present and “LobData = [Textpointer]”.
Looking at page 55 the data (“cat” over and over again) is clearly visible.
Now we run:
ALTER TABLE HasLob ALTER COLUMN LobData nvarchar(max)
We examine the situation again using the same DBCC IND and DBCC Page commands. It may be a surprise but everything is pretty much as it was. Now:
UPDATE HasLob SET LobData = LobData
After running DBCC IND and DBCC Page we find drastic changes. DBCC IND now has no row with a pagetype of 3 anymore - and instead we clearly see the data now on page 78, proving that the humble UPDATE statement has made the changes we seek.
Of course it’s possible to create and run a batch of the needed statements in the same way as with the initial conversion - so here is the final script:
USE MyDatabase GO DECLARE @convert varchar(max) DECLARE @update varchar(max) DECLARE @refreshviews varchar(max) SET @convert = '' SET @update = '' SET @refreshviews = '' SELECT @convert = @convert + 'ALTER TABLE [' + t.TABLE_SCHEMA + '].[' + t.TABLE_NAME + '] ALTER COLUMN [' + c.COLUMN_NAME + '] ' + CASE c.DATA_TYPE WHEN 'text' THEN 'varchar(max)' WHEN 'ntext' THEN 'nvarchar(max)' WHEN 'image' THEN 'varbinary(max)' END + ';' + CHAR(13) + CHAR(10) FROM INFORMATION_SCHEMA.TABLES t JOIN INFORMATION_SCHEMA.COLUMNS c ON c.TABLE_NAME = t.TABLE_NAME WHERE t.TABLE_TYPE = 'BASE TABLE' AND c.DATA_TYPE IN('text','ntext','image') SELECT @update = @update + 'UPDATE [' + t.TABLE_SCHEMA + '].[' + t.TABLE_NAME + '] SET [' + c.COLUMN_NAME + '] = [' + c.COLUMN_NAME + '];' + CHAR(13) + CHAR(10) FROM INFORMATION_SCHEMA.TABLES t JOIN INFORMATION_SCHEMA.COLUMNS c ON c.TABLE_NAME = t.TABLE_NAME WHERE t.TABLE_TYPE = 'BASE TABLE' AND c.DATA_TYPE IN('text','ntext','image') SELECT @refreshviews = @refreshviews + Cmd FROM ( SELECT DISTINCT 'EXEC sp_refreshview ''[' + v.TABLE_SCHEMA + '].[' + v.TABLE_NAME + ']'';' + CHAR(13) + CHAR(10) Cmd FROM INFORMATION_SCHEMA.COLUMNS c JOIN INFORMATION_SCHEMA.VIEWS v ON v.TABLE_SCHEMA = c.TABLE_SCHEMA AND v.TABLE_NAME = c.TABLE_NAME WHERE DATA_TYPE IN('text','ntext','image') )d EXEC (@convert) PRINT 'Columns were successfully converted' EXEC (@update) PRINT 'Tables were successfully updated' EXEC (@refreshviews) PRINT 'Views were refreshed'