Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12345»»»

Varchar or Char? Expand / Collapse
Author
Message
Posted Friday, October 16, 2009 2:40 AM


SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Today @ 7:28 AM
Points: 452, Visits: 849
Sorry, I am probably wrong in regards to MS SQL Server and nVarchar. I don't have an MS source.
I was thinking of unicode and the term 'multibyte' rather than double-byte.

Unicode (specifically the UTF-8 subset) can be encoded in 8-bit 'octets' and there can be between 1 and 4 of them (according to the 2003 revision).

See page 3 of the internet RFC 3629
http://www.ietf.org/rfc/rfc3629.txt


Off topic sorry.
Post #804027
Posted Friday, October 16, 2009 2:52 AM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Friday, July 11, 2014 1:49 AM
Points: 968, Visits: 349
Good one!
Post #804029
Posted Friday, October 16, 2009 2:56 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Thursday, January 24, 2013 9:59 PM
Points: 1,354, Visits: 1,299
This one got me again. Another trick question.

The question asked, "Which of these data types will be more efficient?"

Had it asked, "Which of these data types take less hard drive space?" I might have chosen the correct answer. I selected Varchar because it is more efficient when programming because char pads with extra spaces at the end and you need to rtrim the column before you produce your final output.
Post #804031
Posted Friday, October 16, 2009 3:20 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Wednesday, June 18, 2014 3:57 AM
Points: 141, Visits: 842
diamondgm (10/16/2009)
Firstly, we assume that the average is the result of an even spread (that the average length is in fact 8.5 - we only need an average of 8 for storage to be a moot point in char vs varchar).


That's not an assumption, the even spread is explicitly mentioned in the question. Without it, I agree that the question would have been harder (if not impossible) to answer.


I think our definition of the word 'explicit' differs dramatically. I did not find the question to indicate that the spread was even as a fact.

Secondly, within SQL Server, the efficiency may be best to use char in that instance, but take in to account the code operations of trimming and I think it becomes debatable.


I doubt it. Almost all business applications I have seen have their performance tied to an I/O bottleneck, and have the CPU twiddling it's virtual thumbs while waiting for more data to be read or written. Those few cycles the CPU spends to trim trailing spaces won't affect performance at all, it just means one less thumb to twiddle.


In the cases where this is true (which is most of the time) white space would aversely affect I/O, no? (Maybe I am very wrong here)


I agree with you on a purist level, but practically, I would go with varchar still simply for the programmers' ease.



How, exactly, is varchar easier for a programmer?


Have you never had a programmer complain about having to trim on certain fields and not on others and wishing for a "standard" approach to text data?

I think you have raised excellent objections to my way of thinking about this problem. And on paper (assuming the average is 8.5), I can not disagree. Though in reality, I would still use varchar(10), if only for the convenience to the programmers at a realatively low performace cost.
Post #804038
Posted Friday, October 16, 2009 3:21 AM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Today @ 1:40 PM
Points: 459, Visits: 260
cengland0 (10/16/2009)
This one got me again. Another trick question.

The question asked, "Which of these data types will be more efficient?"

Had it asked, "Which of these data types take less hard drive space?" I might have chosen the correct answer. I selected Varchar because it is more efficient when programming because char pads with extra spaces at the end and you need to rtrim the column before you produce your final output.


But in terms of I/O we could assume fixed length data types are better than varying ones.


Kindest Regards,

Damian Widera
SQL Server MVP,
MCT, MCSE Data Platform, MCSD.NET
Post #804039
Posted Friday, October 16, 2009 3:42 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 8:59 AM
Points: 5,930, Visits: 8,181
diamondgm (10/16/2009)
diamondgm (10/16/2009)
Firstly, we assume that the average is the result of an even spread (that the average length is in fact 8.5 - we only need an average of 8 for storage to be a moot point in char vs varchar).


That's not an assumption, the even spread is explicitly mentioned in the question. Without it, I agree that the question would have been harder (if not impossible) to answer.


I think our definition of the word 'explicit' differs dramatically. I did not find the question to indicate that the spread was even as a fact.


Then our definition of the word 'explicit' does indeed differ dramatically.

The question reads (direct quote; emphasis added by me):
"Their length will vary from seven to ten characters, all English letters, both upper and lower case mixed with numeric digits, with an even distribution of lengths."
What would it need to satisfy your definition of 'explicit'?

Secondly, within SQL Server, the efficiency may be best to use char in that instance, but take in to account the code operations of trimming and I think it becomes debatable.


I doubt it. Almost all business applications I have seen have their performance tied to an I/O bottleneck, and have the CPU twiddling it's virtual thumbs while waiting for more data to be read or written. Those few cycles the CPU spends to trim trailing spaces won't affect performance at all, it just means one less thumb to twiddle.


In the cases where this is true (which is most of the time) white space would aversely affect I/O, no? (Maybe I am very wrong here)


I/O is affected by number of bytes. Or rather, for SQL Server, I/O is affected by the number of rows per 8K page. If the row length is such that saving a few bytes per row increases the unused space on each page without increasing the number of rows (e.q. when row length goes from 2200 to 2100, you still get only 4 rows per page), I/O will not be affected. When the number of rows increases, less I/O will be required to do the same amount of work.

Whether bytes contain whitespace or character data is not relevant at all. Just the number of bytes - and most of all the integer part of the outcome of dividing 8,060 by that number.

I agree with you on a purist level, but practically, I would go with varchar still simply for the programmers' ease.



How, exactly, is varchar easier for a programmer?


Have you never had a programmer complain about having to trim on certain fields and not on others and wishing for a "standard" approach to text data?


Programmers will complain about everything. Even about the requirement to test.

If a programmer is too lazy to type RTRIM(...) around a column name when coding a report and is prepared to let user performance suffer to satisfy his laziness, I'm done with him or her very quickly.
(And if push comes to shove, I can always create a view to cast the column to varchar and trim the trailing spaces)



Hugo Kornelis, SQL Server MVP
Visit my SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
Post #804048
Posted Friday, October 16, 2009 3:43 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 8:59 AM
Points: 5,930, Visits: 8,181
Damian Widera-396333 (10/16/2009)
in terms of I/O we could assume fixed length data types are better than varying ones.


If, and only if,
a) the average length of data is only 2 characters or less below the maximum length. I'd never dream of storing address information in fixed length columns!
b) there is no need to preserve trailing spaces. If there is, varchar is the only option that won't corrupt your data.



Hugo Kornelis, SQL Server MVP
Visit my SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
Post #804049
Posted Friday, October 16, 2009 3:53 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Wednesday, June 18, 2014 3:57 AM
Points: 141, Visits: 842
diamondgm (10/16/2009)
diamondgm (10/16/2009)Firstly, we assume that the average is the result of an even spread (that the average length is in fact 8.5 - we only need an average of 8 for storage to be a moot point in char vs varchar).


That's not an assumption, the even spread is explicitly mentioned in the question. Without it, I agree that the question would have been harder (if not impossible) to answer.


I think our definition of the word 'explicit' differs dramatically. I did not find the question to indicate that the spread was even as a fact.


Then our definition of the word 'explicit' does indeed differ dramatically.

The question reads (direct quote; emphasis added by me):
"Their length will vary from seven to ten characters, all English letters, both upper and lower case mixed with numeric digits, with an even distribution of lengths."
What would it need to satisfy your definition of 'explicit'?



OMG, I didn't see that, my hublest appologies!

Uncle!
Post #804053
Posted Friday, October 16, 2009 5:02 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Thursday, July 3, 2014 2:45 AM
Points: 2,531, Visits: 536
Hugo Kornelis (10/16/2009)

That's reallly a shame, as the only good reasons to use varchar instead of char are a difference of more than 2 between average and maximum length; or a need to preserve trailing spaces.


Of course, if i need the trailing spaces I would go for char, but I have seen statements where the programmer doesn't want the trailing spaces and uses RTRIM to remove then and also uses RTRIM the in the where clause fetch the records regardless to the spaces:

SELECT somecol FROM sometable WHERE RTRIM(anothercol)='somevalue'

And this causes the SQL Server to do a table or index scan (lots of IO) with terrible performance as a result. This issue is not solved completely by using varchar, but I can reduce some of the cases where the programmers are using functions on the left side of the equal sign. The correct way to solve this issue is to teach everyone to be careful of how they use functions in the where clause, and I do that all the time, but... new project, new programmers.


/Håkan Winther
MCITP:Database Developer 2008
MCTS: SQL Server 2008, Implementation and Maintenance
Post #804091
Posted Friday, October 16, 2009 5:13 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 8:59 AM
Points: 5,930, Visits: 8,181
hakan.winther (10/16/2009)
Hugo Kornelis (10/16/2009)

That's reallly a shame, as the only good reasons to use varchar instead of char are a difference of more than 2 between average and maximum length; or a need to preserve trailing spaces.


Of course, if i need the trailing spaces I would go for char


OOPS - misunderstanding alert!

If you need to preserve trailing spaces, you MUST use varchar. Varchar stores the data with the trailing spaces as they were on input (that is, with default settings). Char pads data with spaces to the specified length and there is no way to reconstruct which spaces were already there and which were added by SQL Server to get the string at the proper length.

but I have seen statements where the programmer doesn't want the trailing spaces and uses RTRIM to remove then and also uses RTRIM the in the where clause fetch the records regardless to the spaces:

SELECT somecol FROM sometable WHERE RTRIM(anothercol)='somevalue'


The real problem of these programmers (apart from not understanding how SQL Server indexes work) is a lack of understanding of string comparisons. The RTRIM is completely superfluous, both for char and varchar datatypes. Trailing spaces are compared in these comparisons anyway. (Well, technically the shorter string is padded with spaces to match the length of the longer string before starting a character-by-character comparison, but that has the exact same effect as ignoring trailing spaces).



Hugo Kornelis, SQL Server MVP
Visit my SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
Post #804095
« Prev Topic | Next Topic »

Add to briefcase ««12345»»»

Permissions Expand / Collapse