How can I efficiently downsize some Unicode fields?

  • We have a SQL Server 2012 Enterprise live transactional database that is now growing over 1G per month and is becoming a size problem for us. It is currently at 23G. Character type fields are all Unicode and I have calculated a savings of 5G in space converting only 2 such fields averaging 206 characters each to non-Unicode, and almost 10G in space if we convert a few more of them from nchar and nvarchar to char and varchar types. These fields will never have a requirement to hold Unicode characters that cannot be in the SQL_Latin1_General_CP1_CI_AS collation as they come in as plain ASCII originally and will always do so per the protocol standard.

    I’m the software architect and chief C# developer though only a DBA hack or I would not have designed our database to have Unicode fields for high volume tables that did not need Unicode for those fields when the database was created 3 years ago. I want to correct this mistake now before we finalize converting to an AlwaysOn environment to help with various performance and backup issues.

    After downsizing these two or more fields, we would like to shrink the database one time to take advantage of the space savings for full backups, and for seeding an AlwaysOn environment.

    Questions are –

    1.What is the safest and most efficient conversion technique for downsizing columns from nchar/nvarchar to char/varchar types? Esp. when there are multiple fields in the same table to be converted. I tested doing an “add new column, set new=old, drop old, rename old to new” for both of the main two fields I want to convert from nvarchar(max) to varchar(max), and it took 81 minutes on our test server (4 virtual core, 8G memory) before running out of disk space even though there was 8G left on the disk, and the db has unlimited size set (Could not allocate space for object 'dbo.abc'.'PK_xyz' in database 'xxx' because the 'PRIMARY' filegroup is full). I did delete an old database before it finished after getting a disk warning so maybe it did not count that new space. Regardless it was too slow. And this was on just the two largest of these columns (12.6M rows) and only ran 2 to 3% CPU busy so seemed not very efficient, and indicated unacceptable downtime if we were to convert even these two fields much less any additional fields. Average field size for these two fields was only 206 characters or 412 bytes each. Another technique I plan to try is to create the new table def in a new schema, select into it from the old table, then move tables amongst schema and delete the old table. I have a FK and indexes to contend with on the table.

    2.If I figure out how to do #1 efficiently within an acceptable maint window, what is the safest practice for doing a one-time shrink and end up with organized/rebuilt indexes and updated Statistics? I understand the logic of not doing regular shrinks and that sometimes it can actually increase the size.

    3.Is there any third party tool that could take a backup and restore it into a new database with the modified field definitions or otherwise convert certain field types?

    Any suggestions and best practices most welcome.

    Thanks, Dave

  • Quick question, have you considered turning on compression on those tables and indices, most likely more effective and almost no risk involved.

    😎

  • Dave

    Shrinking your database won't have any effect on the size of the backup, only the size of the database files.

    1. Have you tried ALTER TABLE MyTable ALTER COLUMN MyColumn varchar(n) NOT NULL? It might be worth considering, if you haven't already done so, whether you actually need varchar(max).

    2. Figure out the size your database needs to be, allowing for growth and index maintenance and so on, then shrink to that size. Rebuild indexes after that.

    3. Not that I'm aware of. A restore of a backup restores the database as it was when it was backed up.

    John

  • If you have several columns in the same table (as ALTER COLUMN doesn't support multiple columns in a single statement) and given that some NVARCHAR columns might be PKey etc?? (which cannot be changed with ALTER TABLE ... COLUMN in this way) then it might be more efficient to do all the column changes in one pass, which will involve creating a new temporary table (in the new, revised, structure) and inserting the data into it, and then dropping the old and renaming. This will also require drop/recreate all indexes/FKeys etc. so will take care of sorting out anything in that regard)

    if you use SSMS and TABLE DESIGN, change the Nvarchar to Varchar, you can then press the "Generate script" button, copy&paste the script and cancel the Table Design, and then use/modify the script for testing / release to production / etc.

    I don't know the ins-and-outs of how SQL changes an Nvarchar column to Varchar when using ALTER TABLE ... COLUMN but personally I'd be more comfortable with the Temporary Table route as I'd be sure that the INSERT INTO MyTempTable SELECT * FROM MyOriginalTable then created the most optimal new table. Requires as much free space as your largest table though.

  • Eirikur Eiriksson (10/13/2015)


    Quick question, have you considered turning on compression on those tables and indices, most likely more effective and almost no risk involved.

    Seconded. Row compression has very little overhead, and does include unicode compression

    https://msdn.microsoft.com/en-us/library/ee240835%28v=sql.110%29.aspx

    As for backups, are you compressing your backups? If not, do so. It saves time and storage space and very few SQL Servers are so CPU constrained that they can't handle the extra CPU involved.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Interesting. If the column(s) are Nvarchar and do not contains, will never contain, extended characters wouldn't a one-time change of database structure to Varchar be better than long-term use of Compression etc? maybe its just how I approach such things, but my view is "every little helps" (money saving slogan of one of the huge supermarkets over this side of the pond 😉 ) and I think I'd prefer to have the CPU saving, however small, for the rest of the lifetime of the product.

    Is there also any risk that rebuilds fail to specify compression, going forwards, such that the column(s) revert to Full-Fat? Or once set is that "it"? (barring a deliberate NO_COMPRESSION type act)

  • @SSCarpal Thx. I am going to experiment with compression to see what sort of time/space trade off the data has. Might also be helpful for the forthcoming AlwaysOn env where it should help with replication times.

  • John Mitchell-245523 (10/13/2015)


    Dave

    Shrinking your database won't have any effect on the size of the backup, only the size of the database files.

    1. Have you tried ALTER TABLE MyTable ALTER COLUMN MyColumn varchar(n) NOT NULL? It might be worth considering, if you haven't already done so, whether you actually need varchar(max).

    2. Figure out the size your database needs to be, allowing for growth and index maintenance and so on, then shrink to that size. Rebuild indexes after that.

    3. Not that I'm aware of. A restore of a backup restores the database as it was when it was backed up.

    John

    Thx John for confirming about the backup size. That was going to be one of my tests as well as compressing the backups. 1. My initial test converted to varchar(max) but I am planning to pick varchar(~4000) as I understand max has more overhead. 2. It will continue growing month over month as biz grows though plan to implement a data migration strategy to an archive database. 3. Was just a pipe dream.

  • Kristen-173977 (10/13/2015)


    If you have several columns in the same table (as ALTER COLUMN doesn't support multiple columns in a single statement) and given that some NVARCHAR columns might be PKey etc?? (which cannot be changed with ALTER TABLE ... COLUMN in this way) then it might be more efficient to do all the column changes in one pass, which will involve creating a new temporary table (in the new, revised, structure) and inserting the data into it, and then dropping the old and renaming. This will also require drop/recreate all indexes/FKeys etc. so will take care of sorting out anything in that regard)

    if you use SSMS and TABLE DESIGN, change the Nvarchar to Varchar, you can then press the "Generate script" button, copy&paste the script and cancel the Table Design, and then use/modify the script for testing / release to production / etc.

    Fortunately the two worse columns are not indexed though some others are. No PKs are char types. Seems like the temp table route is the way to go rather than alter/copy/rename each column which was really slow. Thx for the tip on table design spitting out the new script.

  • You have two problems:

    1. Lack of disk space.

    2. Incorrect constraint on a column (wrong datatype).

    With compression you can solve #1 and turn a blind eye to #2.

    Likely your space problem during conversion was because of having recovery set to FULL instead of SIMPLE during the process.

    Personally I would rename the old table, move 10% of the rows into the new table, and repeat until done. That will get you through the tight disk space period.

  • As for backups, are you compressing your backups? If not, do so. It saves time and storage space and very few SQL Servers are so CPU constrained that they can't handle the extra CPU involved.

    Not currently but probably should be. Thx.

  • Kristen-173977 (10/13/2015)


    Interesting. If the column(s) are Nvarchar and do not contains, will never contain, extended characters wouldn't a one-time change of database structure to Varchar be better than long-term use of Compression etc? maybe its just how I approach such things, but my view is "every little helps" (money saving slogan of one of the huge supermarkets over this side of the pond 😉 ) and I think I'd prefer to have the CPU saving, however small, for the rest of the lifetime of the product.

    Is there also any risk that rebuilds fail to specify compression, going forwards, such that the column(s) revert to Full-Fat? Or once set is that "it"? (barring a deliberate NO_COMPRESSION type act)

    I hope to be able to change a few of the worse columns, and also test compression tradeoffs since we are soon moving to an AlwaysOn env though hopefully replication already does some compression. In our vDC our virtual cores are free but we pay for memory and disk.

  • Bill Talada (10/13/2015)


    You have two problems:

    1. Lack of disk space.

    2. Incorrect constraint on a column (wrong datatype).

    With compression you can solve #1 and turn a blind eye to #2.

    Likely your space problem during conversion was because of having recovery set to FULL instead of SIMPLE during the process.

    Personally I would rename the old table, move 10% of the rows into the new table, and repeat until done. That will get you through the tight disk space period.

    1. That was just my test env and I deleted an old database as soon as I got the warning but perhaps not soon enough to allow the test to continue properly though it did continue for probably 10-15 minutes before it told me it failed so not sure what happened. I can see now where the temp table approach is better anyway so will be experimenting with that instead of the copy/rename column approach. I'll make sure we have enough disk space next time.

    Thx and that's a great suggestion to change to Simple during conversion then back to Full.

  • Several bits of great advice I will use so marked multiple answers. Thx folks!

  • dave 67553 (10/13/2015)


    As for backups, are you compressing your backups? If not, do so. It saves time and storage space and very few SQL Servers are so CPU constrained that they can't handle the extra CPU involved.

    Not currently but probably should be.

    I made some tests since Compressed backups were made available (without Enterprise version) since SQL 2008 (IIRC)

    My test figures were:

    Backup Filesize reduction 80-85%

    Backup elapsed time saved 40%

    Restore would also be faster (less Disk IO to restore a compressed backup file)

    We then decided to make COMPRESSED backup the default.

    (There is a CONFIG setting to set COMPRESSED BACKUP as the default, so if you set that you won't have to change anything else WRT your backup jobs)

    USE master;

    GO

    EXEC sp_configure ‘backup compression default’, '1';

    RECONFIGURE WITH OVERRIDE;

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply