Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Stairway to Data, Step 3: Strings Expand / Collapse
Author
Message
Posted Thursday, May 12, 2011 10:10 PM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Today @ 10:45 AM
Points: 1,945, Visits: 2,996
Comments posted to this topic are about the item Stairway to Data, Step 3: Strings

Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL
Post #1108183
Posted Wednesday, May 18, 2011 9:07 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: 2 days ago @ 6:29 AM
Points: 1,657, Visits: 4,739
From my experience, the biggest pitfall one can encounter, when context switching between writing SQL for SQL Server versus Oracle, has nothing to do with minor variations in SQL syntax or proprietary functions, but rather we must keep in mind that in Oracle an empty string is equivalent to NULL.
For example, I ran the following queries on Oracle. A SQL Server developer not familiar with Oracle would look at the SQL and expect the opposite result.
select case when 'Smith' > '' then 'True' else 'False' end 
as smith_greater_than_emptystr from dual;

select case when '' = '' then 'True' else 'False' end
as emptystr_equals_emptystr from dual;

select case when '' is null then 'True' else 'False' end
as emptystr_is_null from dual;

select coalesce( '', 'Smith' )
as first_non_null_string from dual;

select count(*) cust_count
from (select '' as cust_name from dual) x where cust_name is not null;

SMITH_GREATER_THAN_EMPTYSTR 
---------------------------
False

EMPTYSTR_EQUALS_EMPTYSTR
------------------------
False

EMPTYSTR_IS_NULL
----------------
True

FIRST_NON_NULL_STRING
---------------------
Smith

CUST_COUNT
----------------------
0
Post #1111074
Posted Tuesday, May 31, 2011 11:36 PM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Sunday, May 4, 2014 7:48 PM
Points: 369, Visits: 217
"Japanese is the worst written language on Earth. "

And English is the worst spoken language on Earth. Don't get me wrong, I love English and would even support getting rid of all languages other than English even though it is not my first language, but because English does not encode the pronunciation in the spelling it gives rise to numerous unintelligible spoken accents and you will probably need a translator to understand my spoken English, while I will need subtitles to understand your spoken English.

But English is the easiest language to learn and the easiest language to type as it has just 26 letters the lowest among all the languages of the world. No wonder it become the number one language.
Post #1117860
Posted Wednesday, June 1, 2011 1:21 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Wednesday, July 30, 2014 8:27 AM
Points: 13, Visits: 75
Usual good article, being a bit picky would point out that the EBCDIC acronym means:
"Extended Binary Coded Decimal Interchange Code".


SQL Novice - Here to learn.
Post #1117902
Posted Wednesday, June 1, 2011 1:28 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Wednesday, September 10, 2014 1:18 AM
Points: 242, Visits: 155
Re: "Japanese is the worst written language on Earth" - I guess this is subjective; Japanese people may differ in their opinion!

I'm interested why you also say "If you store text in a particular language you probably ought to be using a document management system and not SQL." - would multi-lingual countries like Belgium and Switzerland not have a requirement to store multiple versions of information? I recall working in Brussels where my colleagues had Windows installed with at least two languages by default, which I believe is mandatory.
Post #1117903
Posted Wednesday, June 1, 2011 1:55 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Monday, August 25, 2014 6:13 AM
Points: 40, Visits: 89
the script

drop table t17
select * into t17 from (select null s union all select '' union all select ' ' union all select ' ' ) x
select '*'+s+'*' from t17

produces an output line of **
so I think the statement "you cannot store a string of length zero" is not true.
Post #1117911
Posted Wednesday, June 1, 2011 2:04 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Wednesday, May 14, 2014 7:57 AM
Points: 76, Visits: 342
h.tobisch: even though the output is ** for that line, space is still taken up in the database storing the fact that this is an empty string, that is what Joe is saying.

James
MCM [@TheSQLPimp]
Post #1117912
Posted Wednesday, June 1, 2011 2:39 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, May 14, 2012 4:59 AM
Points: 20, Visits: 16
The introduction to the article is a bit confusing stating that almost all character encodings in use are fixed-length. This is in general not true for UTF-16 and certainly not for UTF-8. Also calling Unicode a 16-bit standard is incorrect. If anything it is a 20-bit standard, though almost all useful characters have code points in the 16-bit regime.

Another small point: the nchar and nvarchar types in sql server actually don't use UTF-16 but UCS-2, which are identical except that UCS-2 does not support code points beyond 65535 and is thus a fixed-length encoding, i.e. all characters are encoded with 2 bytes. The next version of sql server will have proper support for Unicode.
Post #1117922
Posted Wednesday, June 1, 2011 9:08 AM
SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Thursday, September 4, 2014 1:28 PM
Points: 1,676, Visits: 1,756
I would venture to state the obvious that this is an excellent article, thank you Joe. I have a question about the this statement:

Joe Celco (6/01/2011)

Even a VARCHAR(n) string has to be padded out with blanks to bring it to the minimum length of one. Thus, any attempt to store an empty string in an SQL column will be padded out with a blank and you cannot store an empty string in SQL.

Since it is not possible to store an empty string in SQL then it appears that the only difference between the empty string and the single space for varchar data type is the value stored within 2 bytes overhead, the one responsible for storing the number of characters of the respective varchar. Both len and datalength of the variable defined as varchar(n) return 0 if the variable is assigned a value of empty string and both return 1 if the same variable is assigned a value of a single space. This probably means that the actual value in both cases is the single space (0x20 regardless whether the assigned value is empty string or single space) but the value of the overhead is different, and this is the key to determine which one is which. For empty string the overhead value is 0x0000 and for for single space the overhead value is 0x0001.

declare @empty varchar(10);
declare @space varchar(10);

select @empty = '', @space = ' ';

select
datalength(@empty) empty_len,
datalength(@space) space_len;

-- results are as expected though the actual
-- stored value is 0x20 for both strings:
empty_len space_len
----------- -----------
0 1

Please let me know if my understanding of your statement about storage of the empty string or rather about impossibility of it in SQL Server is correct.

Thank you,

Oleg
Post #1118099
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse