What is the smallest size datatype you can use for this?

  • I am playing with some data I have and working on restricting it as small as possible.

    I just discovered something that didn't occurr to me but it might have to some.

    Taking the started US Postal zip code nnnnn-nnnn (where n = a number) what is the datatype (or combination of more than 1) you can cram this into? 

    Note: I don't want any compression schema used just straight data in and out of a datatype or two in one or two fields.

    This is just for fun in case there is something elase I didn't see and to see if anyone has every thought of the way I just discovered.

  • It's the first time I see us zip codes of more than 5 numbers. Is this a common occurance?

    If not maybe you could ship that in its own colum using a smallint (2 bytes instead of 3 bytes for chars), Then you could shrink the current zip code column to char(5) or int(4), therefore saving at least 2-3 bytes per line.

    Just my 2 cents.

  • If I am understanding the requirements correctly then a DECIMAL(9) seems efficient, needing 5 bytes storage. But it of course means there will be conversion needed when working with it, but that is the way I understand the reqs.

  • I prefer varchar(10) over 2 int or 1 int and 1 smallint, due to leading zeros in the Zip Code and Plus 4 portions, otherwise when you output two numbers you will have to:

    SELECT NULLIF(RIGHT('00000'+CONVERT(varchar,Zip_int),5),'00000')+ISNULL('-'+RIGHT('0000'+CONVERT(varchar,Plus4_smallint),4),'')

    Instead of:

    SELECT ZipCode

    With varchar(10) you do not waste database storage with 5 spaces for a char(10) when you only have the Zip code portion of the value.

    A plus for varchar(10) over int, is the support for the Canadian and UK PostalCode, which use alpha numeric characters.

    Andy

  • This shows the space usage for a table with only the Zip code:

    CREATE TABLE Zip_1_Num (Zip decimal(9) NULL)

    CREATE TABLE Zip_2_Num (Zip int NULL, Plus4 smallint NULL)

    CREATE TABLE Zip_varchar (Zip varchar(10) NULL)

    TableName    NumCols     RowSize

    ------------ ----------- --------

    Zip_1_Num    1           12

    Zip_2_Num    2           13

    Zip_varchar   1           20

    So for strictly space DECIMAL wins, but try and ORDER BY and you will get a surprise, plus how to deal with a NJ Zip+4:

    Decimal   Int   Smallint  Varchar

    --------- ----- --------- ----------

    80080234  8008  234       08008-0234

    Andy

  • Remi,

    This zip code format is referred to as zip plus 4.  All us addresses have a zip + 4 code but it is only commonly used in commercial applications.  Its purpose it to further narrow down the destination of the mail  and may help to reduce bulk mailing costs.

    I couldn't even tell you what my home address extra 4 digits are without looking at some commercial mail sent to me.

  • Thanx for the info.

    So what's the final (shortest) datatype for this problem?

    Anyones already done this in a real application and can relate the pros and cons of changing from the varchar(10) datatype?

  • INT is smallest, but generally not recommended because of the manipulation. INT is +- 2 billion, roughly, plenty for 9 digits. DECIMAL(9) is 5 bytes. I see no advantage to that over INT, though I suppose you could store the +4 as decimals and extract the ZIP5 with FLOOR(). I'd recommend CHAR(9) or CHAR(10), unless you have a really good reason for saving space. The following works well for measuring row size.

    It looks like SQL 2000 requires a minimum of 11 bytes for a row (smallint, two tinyints, or 8 bit fields, or a null varchar()). Note that bit fields take two bits each, because of the NULL bit, even if NOT NULL. It seems VARCHAR has 4 bytes of overhead for the length. VARCHAR(9) with 5 digits stored and CHAR(9) take the same amount of space.

    I wrote a script to measure this using dynamic SQL:

    use tempdb

    go

    declare @datatype varchar(20)

    declare dtcurs cursor fast_forward for

     SELECT datatype = 'INT'

     UNION ALL SELECT 'DEC(9)'

     UNION ALL SELECT 'CHAR(5)'

     UNION ALL SELECT 'CHAR(9)'

     UNION ALL SELECT 'VARCHAR(9)'

    open dtcurs

    fetch next FROM dtcurs into @datatype

    while @@fetch_status = 0

    begin

     if object_id('tv') is not null

      drop table tv

     if object_id('tv2') is not null

      drop table tv2

     exec ('create table tv (x ' + @datatype + ')')

     set nocount on

     declare @i int set @i=0

     while @i < 32

     begin

       insert tv(x) select 98105

       set @i=@i+1

     end

     select tv1.* into tv2 from tv tv1, tv tv2, tv tv3, tv tv4

     --exec sp_spaceused tv2

     declare @id int

     set @id = object_id('tv2')

     declare @output varchar(80)

     select @output = left(@datatype + space(15), 15) + ' takes '

          + str( ( ( select sum(dpages) from sysindexes

                     where indid < 2 and id = @id)

           +( select isnull(sum(used), 0) from sysindexes

                     where indid = 255 and id = @id )

                  ) * d.low / 1048576., 7, 2) + ' bytes'

     from master.dbo.spt_values d

     where d.number = 1 and d.type = 'E'

     print @output

     fetch next FROM dtcurs into @datatype

    end

    close dtcurs

    deallocate dtcurs

    go

    set nocount off

    go

    --OUTPUT (where data = 98105):

    INT             takes   13.18 bytes

    DEC(9)          takes   14.19 bytes

    CHAR(5)         takes   14.19 bytes

    CHAR(9)         takes   18.26 bytes

    VARCHAR(9)      takes   18.26 bytes

    Measured somewhat more manually:

    --varchar(9) 11.14 ''

    --char(5)    14.18

    --char(9)    18.25

    --varchar(9) 18.26 '98105'

    --char(10)   19.24

    --dec(9)     14.19

    --int        13.20

    Some additional:

    --tinyint    11.14 --set to 1

    --2 tinyints 11.14

    --3 tinyints 12.16

    --4 tinyints 13.18

    --5 tinyints 14.19

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply