CHAR vs VARCHAR

  • I am running SQL 2000 SP4.

    Here's my table:

    CREATE TABLE [BIGHUH] (

    [LOGIN_ID] [char] (10) NULL ,

    [NAME] [char] (40) NULL ,

    [COVERAGE_PATH] [char] (8) NULL ,

    [COVERAGE_PATH_NAME] [char] (40) NULL ,

    [SN1] [char] (8) NULL ,

    [SN1_NAME] [char] (40) NULL ,

    [SN2] [char] (8) NULL ,

    [SN2_NAME] [char] (40) NULL ,

    ..... 3-59 repeating

    [SN60] [char] (8) NULL ,

    [SN60_NAME] [char] (40) NULL ,

    [DIRECT_AGENT_SKILL] [char] (8) NULL ,

    [DIRECT_AGENT_SKILL_NAME] [char] (40) NULL ,

    [GROUP_SET] [smallint] NULL ,

    [COL_DAT] [char] (10) NULL

    )

    Maybe I have things completely wrong here, but wouldn't the table be smaller if all of the 'char's where converted to 'varchar's? I say this because most of the time they are going to be blank when you get past SN5 and above. I always thought that varchar was like the run-length encoding in a jpeg file. If the data is all blank (or one color for a jpeg), then the file will be smaller. If the data is always full (or multiple colors for a jpeg), then the file will be bigger.

    There is about 2 million rows in the table with no indexes and no statistics. The table is about 7.5gb. I converted SN50*-SN60* to varchar and the file grew by about 90k. I tried to UPDATE STATISTICS, DBCC UPDATEUSAGE and even tried to put a clustered index on the table and the size stayed the same.

    Do I need to run an LTRIM(RTRIM()) on all of the columns that I have converted?

    What am I missing here?

    Live to Throw

    Throw to Live


    Live to Throw
    Throw to Live
    Will Summers

  • CHAR is fixed length where as VARCHAR is variable length so VARCHAR should use less space than CHAR. The only caveat is that VARCHAR fields need to store something in the header 192 bytes of a row that indicates where the VARCHAR begins and ends.

    In addition there is a bitmask in the header bytes that also indicates whether a field is nullable or not.

    I suspect that what is happening is something similar to the below.

    You have a record that is 4K.

    You can fit 2 records on an 8K page.

    You increase your record size by 1 byte and now can only fit 1 record per 8K page so you have doubled your storage requirements for the sake of 1 byte. 3999bytes are wasted space in the records and there is nothing you can do about it.

    I would seriously recommend normalising your table. I'm not sure what you store in SN1 ...SN60 but if it is a fixed length then your structure would be something like

    LOGIN_ID CHAR(10) NOT NULL ,

    SN CHAR(8) NOT NULL ,

    SN_NAME VARCHAR(40) NOT NULL

    This would be vastly more efficient as you wouldn't be storing a vast amount of wasted space in your tables.

  • Completely agree with on the normalization thing. That was my first thing I wanted to try. I don't have control over that though.

    I can change the column types, that's about it.

    So, I should take off the NULLs, change all to VARCHAR and then my table should shrink?


    Live to Throw
    Throw to Live
    Will Summers

  • I thought I read somewhere once that if you alter the datatype of a column, SQL basically abandons the location of the old column and moves the data to the end of the row??

    I did a quick test and see the same sort of growth.

    Brian

  • So I guess that I'd have to create a new table with all VARCHAR and copy all rows from the old table to it?


    Live to Throw
    Throw to Live
    Will Summers

  • I have created a new table with only VARCHARs and have loaded it. The size didn't change.

    I have tried it with and without NULLs. Same results.


    Live to Throw
    Throw to Live
    Will Summers

  • I also RTRIM(LTRIM()) to two of the columns, then did UPDATE STATS and UPDATEUSAGE and the size didn't change.

    This doesn't make any sense to me. I have converted columns before from CHAR to VARCHAR and I saw a definite change in size.


    Live to Throw
    Throw to Live
    Will Summers

  • You said most of the repeated columns are empty. Do you mean they are NULL? If so, even fixed-length columns take up no space when they are NULL. So transfering a null char(100) field to a varchar(100) field will not get you any immediate space savings. The space savings will come when you start to populate them--assuming the data is not actually fixed-length itself, which you didn't specify.

    Check these SPxx columns to see what, if anything, they actually contain.

    Tomm Carr
    --
    Version Normal Form -- http://groups.google.com/group/vrdbms

  • As I said before I have tried this with NULL and NOT NULL columns and the size didn't change.

    As I said before that the data in the repeated columns is almost always blank, or not populated.

    I checked SN60_NAME and it contained nothing in any of the rows. (I guess it's for future use)

    I changed SN60_NAME to VARCHAR(40) instead of CHAR(40), made it NOT NULL, ran LTRIM(RTRIM()) on the column and ran UPDATE STATS and UPDATEUSAGE afterwards - no change in table size. There is no NULL values in the SN60_NAME column.


    Live to Throw
    Throw to Live
    Will Summers

  • I setup a clustered index on the table and did a DBCC INDEXDEFRAG on the table and no change in size.

    The output said that zero pages were removed.


    Live to Throw
    Throw to Live
    Will Summers

  • It doesn't matter if the columns are defined NULL or NOT NULL. Do the columns contain NULL values? In your original post you stated that most of the ones after SP05 are "empty." If by "empty" you mean "contains a NULL value" then you will not see any size difference when you convert them to varchar. A NULL char field is the same size as a NULL varchar field. If by "empty" you mean "contains an empty string," then you will also not see a size difference unless you trim each and every SPxx and SPxx_Name in every row (not just SP60/SP60_Name) when you transfer the data.

    You also fail to mention the data (if you defined SN60_Name as NOT NULL, then it must contain something). If the data is at or near 40 characters in length (after trimming) then you will also not see a size difference.

    Tomm Carr
    --
    Version Normal Form -- http://groups.google.com/group/vrdbms

  • Thank you for your posts, but I did not fail to mention the data for SN60_NAME:

    I checked SN60_NAME and it contained nothing in any of the rows. (I guess it's for future use)

    If the columns had NULL values in them, then it wouldn't have allowed me to change the NULL/NOT NULL value.

    I have checked all of the rows for SN60_NAME for IS NULL and = NULL and ISNULL(SN60_NAME) in seperate queries, no rows were found to be null

    I have set all the columns to allow NULLs and then updated with LTRIM(RTRIM()) - no change in size.

    I have set all the columns to NOT NULL and then updated with LTRIM(RTRIM()) - no change in size.


    Live to Throw
    Throw to Live
    Will Summers

  • Then please post the "create table" and subsequent "insert" so we can see exactly what you did.

    Come to think of it, a "select top 10 *" from the original table probably wouldn't hurt.

    Tomm Carr
    --
    Version Normal Form -- http://groups.google.com/group/vrdbms

  • Using LTRIM(RTRIM()) will not make NULLs... that may be a part of the problem... here's the proof...

    DECLARE @NullOrBlank VARCHAR(8000)

     SELECT @NullOrBlank = REPLICATE(' ',100)

     SELECT DATALENGTH(@NullOrBlank) AS DataLengthBefore,

            LEN(@NullOrBlank)        AS LenBefore

     SELECT @NullOrBlank = LTRIM(RTRIM(@NullOrBlank))

     SELECT DATALENGTH(@NullOrBlank) AS DataLengthAfterTrim,

           LEN(@NullOrBlank)         AS LenAfterTrim

     SELECT CASE WHEN @NullOrBlank IS NULL

                 THEN 'IS NULL'

                 ELSE 'IS Blank'

            END StatusAfterTrim

     SELECT @NullOrBlank = NULL

     SELECT CASE WHEN @NullOrBlank IS NULL

                 THEN 'IS NULL'

                 ELSE 'IS Blank'

            END StatusAfterSetToNull

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Almost forgot to ask... I know you said you can't normalize the table (would be the greatest space saver of them all here) so I won't bug you about that... but I am curious as to what the "SN" columns are... in other words, what are "SN"s in this case?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 33 total)

You must be logged in to reply to this topic. Login to reply