text -> ntext causes massive size increase...

  • Hi,

    (See also 'Database size issue...')

    We needed to change some text fields to ntext. As that conversion isn't supported, the way we acheieved the conversion was to;

    - create a new column;

    - copy all the data across from the text to ntext column;

    - delete the original column;

    - add a new column with the deleted columns name, and type ntext;

    - copy all the data back from the temp col;

    - delete the temp col.

    That seems to have worked ok for the conversion, but it's given us a massive size increase in the database size, and we're unable to reduce it. We've gone from a database that was apporximatly 800Mb, to one that's over 1800Mb.

    DBCC CHECKDB, DBCC SHRINKDATABASE, DBCC CHECKTABLE & DBCC CHECKALLOC don't help or give any clues.

    However, if we export the whole database to another server, it reduces back to approximatly the original size.

    Can anyone help out here?

  • Doesn't ntext takes twice as much space as text?

    quote:


    quote:

    --------------------------------------------------------------------------------

    The Unicode specification defines a single encoding scheme for most characters widely used in businesses around the world. All computers consistently translate the bit patterns in Unicode data into characters using the single Unicode specification. This ensures that the same bit pattern is always converted to the same character on all computers. Data can be freely transferred from one database or computer to another without concern that the receiving system will translate the bit patterns into characters incorrectly.

    One problem with data types that use 1 byte to encode each character is that the data type can only represent 256 different characters. This forces multiple encoding specifications (or code pages) for different alphabets such as European alphabets, which are relatively small. It is also impossible to handle systems such as the Japanese Kanji or Korean Hangul alphabets that have thousands of characters.

    Each Microsoft® SQL Server™ collation has a code page that defines what patterns of bits represent each character in char, varchar, and text values. Individual columns and character constants can be assigned a different code page. Client computers use the code page associated with the operating system locale to interpret character bit patterns. There are many different code pages. Some characters appear on some code pages, but not on others. Some characters are defined with one bit pattern on some code pages, and with a different bit pattern on other code pages. When you build international systems that must handle different languages, it becomes difficult to pick code pages for all the computers that meet the language requirements of multiple countries. It is also difficult to ensure that every computer performs the correct translations when interfacing with a system using a different code page.

    The Unicode specification addresses this problem by using 2 bytes to encode each character. There are enough different patterns (65,536) in 2 bytes for a single specification covering the most common business languages. Because all Unicode systems consistently use the same bit patterns to represent all characters, there is no problem with characters being converted incorrectly when moving from one system to another. You can minimize character conversion issues by using Unicode data types throughout your system.


  • Yep - you're right - ntext is twice the size as text.

    However!! If the whole database were text columns, and they were all changed to ntext, then I would expect the database to double to 1600Mb.

    But! Text columns only accounted for a tiny proportion of the columns, so I don't think that accounts for the massive size difference.

    Plue, like I said, "if we export the whole database to another server, it reduces back to approximatly the original size..."

    Any further advice would be great!

    Thanks,

    Ollie.

  • Is it purely your text fields or have you converted varchar to nvarchar, char to nchar etc?

    If you used nchar and nvarchar then not only does the base data take extra space but the indices will also take extra space.

    If you increase the storage requirements of your fields won't this also affect the number of pages required to store the information?

    I'm thinking that if you start off with 'n' pages of data then doubling your requirement may require extra pages.

    I may be getting confused on this point.

    I seem to remember you get something similar when it comes to saving files onto disk. You may have a file containing 5 bytes of data, but there is a minimum amount of space that a file can use, therefore your file takes up, say, 1K.

    ========================

    He was not wholly unware of the potential lack of insignificance.

  • Regarding the issue of the db size, is the size (1800 MB) including the transaction log? If so did you truncate the log? Also since you have converted datatypes from Varchar to Nvarchar and we know that the unicode datatypes increase the space usages from 1 byte 2 bytes to store the same data that will not only affect the table size, and Index sizing but now you have the page allocation issues that Mr. Poole described briefly below so in theory you have fragmented tables, indexes. Extents etc… I did also notice that you ran dbcc's, but didn't specify reindexing, or defraging have you checked the Scan Density and Extent Fragmentation… also check the Average Bytes free per page if you are allocating a full 8 K page to store 1500 bytes that’s not going to be a good thing…

    Sorry for the ramblings…Hope this helps!

    Jim

    Jbabington
    Jbabington@hotmail.com

  • Hi ojl96!

    I think your problem concerns the dropping of your TEXT column.

    Use DBCC CLEANTABLE.

    Look in BOL under the keyword: DBCC statements, DBCC CLEANTABLE

    Let me know how it works out!

    Regards, Hans!

     

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply