SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Finding special characters within character strings


Finding special characters within character strings

Author
Message
michael.f.morin
michael.f.morin
SSC-Enthusiastic
SSC-Enthusiastic (108 reputation)SSC-Enthusiastic (108 reputation)SSC-Enthusiastic (108 reputation)SSC-Enthusiastic (108 reputation)SSC-Enthusiastic (108 reputation)SSC-Enthusiastic (108 reputation)SSC-Enthusiastic (108 reputation)SSC-Enthusiastic (108 reputation)

Group: General Forum Members
Points: 108 Visits: 131
Comments posted to this topic are about the item Finding special characters within character strings
wes_springob
wes_springob
Forum Newbie
Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)

Group: General Forum Members
Points: 5 Visits: 12
You say you run this every morning? Why not just set up a constraint on the column to prevent the problem characters from getting in, in the first place?
halifaxdal
halifaxdal
SSCrazy
SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)

Group: General Forum Members
Points: 2541 Visits: 1744
interesting summary article, worth to save for later use
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (118K reputation)SSC Guru (118K reputation)SSC Guru (118K reputation)SSC Guru (118K reputation)SSC Guru (118K reputation)SSC Guru (118K reputation)SSC Guru (118K reputation)SSC Guru (118K reputation)

Group: General Forum Members
Points: 118523 Visits: 41480
wes_springob (8/19/2014)
You say you run this every morning? Why not just set up a constraint on the column to prevent the problem characters from getting in, in the first place?


Heh... because if someone fails to mail me a check because of a typo in a bulk load, there will be hell to pay! :-D For example, if I live at 5678 Main Street and someone accidently had the SHIFT key depressed when they typed the "5" so that the address came out as %678 Main Street (just as an example), I'd want someone working on the correction rather than just rejecting the row of bulk data. Why not do those types of checks in the front end before it makes it to a bulk file? Heh... yeah... people always do those kinds of checks in the front end, don't they? :-P

It just doesn't hurt to do these types of checks in the database, especially where any form of keyboard input was ever involved. It would be nice if they did it in the front end, but there's no guarantees there. These types of checks might also be good for capturing examples of what people have been typing to design new "validation" code for the front end or whatever source the data came from.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Chris Hurlbut
Chris Hurlbut
SSCommitted
SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)

Group: General Forum Members
Points: 1888 Visits: 540
What gets me most of the time are Carriage Returns, Line Feeds and tabs.
We receive many different file formats from different clients, while we try to standardize feeds as much as possible, these sometimes end up in the data I have to process.
Usually Excel is involved in some way...Analysts love excel but for ETL it makes me cringe.
alexander.oss
alexander.oss
SSC-Enthusiastic
SSC-Enthusiastic (137 reputation)SSC-Enthusiastic (137 reputation)SSC-Enthusiastic (137 reputation)SSC-Enthusiastic (137 reputation)SSC-Enthusiastic (137 reputation)SSC-Enthusiastic (137 reputation)SSC-Enthusiastic (137 reputation)SSC-Enthusiastic (137 reputation)

Group: General Forum Members
Points: 137 Visits: 130
Beware the Unicode character FULLWIDTH APOSTROPHE ' (U+FF07), because we had a problem with it once since it looks like SQL Server will treat it like any other apostrophe, i.e. as a string constant delimiter or apostrophe escape. So if you're building dynamic SQL (which of course you aren't, and of course neither were we) and you're manually looking for characters to escape or otherwise handle before executing your SQL string, you may need to handle it too.
michael.f.morin
michael.f.morin
SSC-Enthusiastic
SSC-Enthusiastic (108 reputation)SSC-Enthusiastic (108 reputation)SSC-Enthusiastic (108 reputation)SSC-Enthusiastic (108 reputation)SSC-Enthusiastic (108 reputation)SSC-Enthusiastic (108 reputation)SSC-Enthusiastic (108 reputation)SSC-Enthusiastic (108 reputation)

Group: General Forum Members
Points: 108 Visits: 131
If you own, or are a member of, the project that developed the system that would be great. Sadly, many times you are administering a system that you cannot change, at least in the short term, or have very little control over at that level.
NBSteve
NBSteve
SSCrazy
SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)

Group: General Forum Members
Points: 2260 Visits: 950
Thanks for the article! One clarification:

Also notice that “Ça va” is missing. If the column is case sensitive, should that not have been captured? Again this is not a course on regular expressions but note: the regular expression does not see the accented characters as special. Meaning (é and e) or (Ä and A) are evaluated as the same, as far as the expression is concerned.


This is incorrect, they are not evaluated the same. When checking ranges in regular expressions, the collation setting is used. When you specify [A-Z], this includes every character alphabetically sorted between 'A' and 'Z' according to the collation. 'Ç' is sorted between 'C' and 'D' alphabetically in this collation, and so is included in [A-Z].


SELECT 'IT WORKS'
WHERE 'C' COLLATE SQL_Latin1_General_CP1_CI_AS LIKE ('%[C-C]%');

SELECT 'IT WORKS'
WHERE 'C' COLLATE SQL_Latin1_General_CP1_CI_AS LIKE ('%[B-C]%');

SELECT 'IT WORKS'
WHERE 'C' COLLATE SQL_Latin1_General_CP1_CI_AS LIKE ('%[C-D]%');

SELECT 'IT WORKS'
WHERE 'Ç' COLLATE SQL_Latin1_General_CP1_CI_AS LIKE ('%[C-C]%');

SELECT 'IT WORKS'
WHERE 'Ç' COLLATE SQL_Latin1_General_CP1_CI_AS LIKE ('%[B-C]%');

SELECT 'IT WORKS'
WHERE 'Ç' COLLATE SQL_Latin1_General_CP1_CI_AS LIKE ('%[C-D]%');


In 1-3, you can see that a plain 'C' matches all 3 ranges. However, in 4-6, the accented 'Ç' character only matches the range [C-D].
tom.w.brannon
tom.w.brannon
SSCommitted
SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)

Group: General Forum Members
Points: 1669 Visits: 1394
Thanks for the post! I set up a search on all the data in my modest sized database and found that LEN also ignores CRLF characters. I found I had a field that had been loaded with trailing blanks up to the last two characters 0D 0A. The value from LEN varied but DATALEN was always 65. Fixed with LEFT on the LEN.
Robert.Sterbal
Robert.Sterbal
SSC-Addicted
SSC-Addicted (449 reputation)SSC-Addicted (449 reputation)SSC-Addicted (449 reputation)SSC-Addicted (449 reputation)SSC-Addicted (449 reputation)SSC-Addicted (449 reputation)SSC-Addicted (449 reputation)SSC-Addicted (449 reputation)

Group: General Forum Members
Points: 449 Visits: 2000
I'm not sure why we don't check our data as many ways as possible. I have an environment that assumes everything is installed as expected, and a help desk busy with inquiries for when it isn't.

The key is to develop testing that is efficient to run.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search