Search for hyphen in database: it does not find the hyphen?

  • I am doing a data cleansing activity.

    The original Source was in Access Database

    Current Source in SQL Server 2008 R2

    nvarchar(50)

    The reference table is in SQL server 2012.

    varchar(100)

    I used a DT package to convert the reference data to nvarchar(50)

    I placed them in some table

    There is an exact match for the value of lastname which equals: Kirkland‐Wallace

    when I run the query

    select LastName from

    DT5

    where LastName like '%-%'

    no results

    when I run the query

    SELECT LastName FROM DT5

    WHERE CHARINDEX('-', LastName) > 0

    I get everything but Kirkland‐Wallace that has a hyphen

    when I run the query

    select LastName from --

    DT5

    where LastName like '%Wallace%' -- switch these or '%Kirkland%

    I get the correct result

    For some reason it does not see the hyphen as a hyphen but it display's itself as a result set.

    Has anyone come across this situation before?

    I would be appreciate any feedback

  • there's at least two other longer than normal hyphens that has a different ascii code than the "normal hyphen ASCII('-')=45,without even going into nvarchar data sets.

    i'm betting it's one of these others?

    /*Results

    NormalHyphenChar150Char151

    -–—

    */

    SELECT CHAR(45) As NormalHyphen,

    CHAR(150) Char150,

    CHAR(151) As Char151

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Lowell (3/24/2016)


    there's at least two other longer than normal hyphens that has a different ascii code than the "normal hyphen ASCII('-')=45,without even going into nvarchar data sets.

    i'm betting it's one of these others?

    /*Results

    NormalHyphenChar150Char151

    -–—

    */

    SELECT CHAR(45) As NormalHyphen,

    CHAR(150) Char150,

    CHAR(151) As Char151

    I'll bet one of them is the annoying one that's pasted in from MS Word.

  • That was a good thought!

    SELECT LastName FROM DT5

    WHERE CHARINDEX(CHAR(151), LastName) > 0

    no results

    SELECT LastName FROM DT5

    WHERE CHARINDEX(CHAR(150), LastName) > 0

    no results

    SELECT LastName FROM DT5

    WHERE CHARINDEX(CHAR(45), LastName) > 0

    same results as hyphen above

    I appreciate the rapid turrnaround

  • bdkdavid (3/24/2016)


    That was a good thought!

    SELECT LastName FROM DT5

    WHERE CHARINDEX(CHAR(151), LastName) > 0

    no results

    SELECT LastName FROM DT5

    WHERE CHARINDEX(CHAR(150), LastName) > 0

    no results

    SELECT LastName FROM DT5

    WHERE CHARINDEX(CHAR(45), LastName) > 0

    same results as hyphen above

    I appreciate the rapid turrnaround

    test for high ascii, like the word hyphen:

    Select LastName FROM DT5 Where LastName Like '%[^0-9a-zA-Z -]%' COLLATE Latin1_General_BIN

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • What do you get if you run this query?

    WITH

    E(n) AS(

    SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0))E(n)

    ),

    E2(n) AS(

    SELECT a.n FROM E a, E b

    ),

    E4(n) AS(

    SELECT a.n FROM E2 a, E2 b

    ),

    cteTally(n) AS(

    SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) n

    FROM E4

    )

    SELECT n, SUBSTRING(LastName, n, 1), ASCII( SUBSTRING(LastName, n, 1))

    FROM cteTally t

    JOIN DT5 ON t.n <= LEN(DT5.LastName)

    WHERE LastName like '%Kirkland%Wallace%'

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • The hyphen might be a non-breaking hyphen. Here's an example of how it behaves:

    declare @test-2 table (a nvarchar(10));

    insert @test-2 values ('abc-def'), ('abc' + nchar(0x02010) + 'def')

    select a, ascii(SUBSTRING(a, 4, 1)), UNICODE(SUBSTRING(a, 4, 1))

    from @test-2

    --where a like '%-%'

    The result without the where clause is:

    abc-def4545

    abc-def458208

    And with the where clause:

    abc-def4545

  • Hi Luis

    Here are the Results:

    1K75

    2i105

    3r114

    4k107

    5l108

    6a97

    7n110

    8d100

    9-45

    10W87

    11a97

    12l108

    13l108

    14a97

    15c99

    16e101

  • I just realized that you're querying the source table. Change the ASCII function for UNICODE. That should show the difference as shown by Stephanie.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Hi Stephanie,

    SELECT LastName FROM DT5

    WHERE CHARINDEX(nchar(0x02010), LastName) > 0

    or where Charindex(nchar(8208),LastName) > 0

    this get the desired out put

    Kirkland-Wallace

    Is there a cause for this behavior?

    I do not understand the source of the problem.

    Can anyone shed some light on the subject?

  • Hi Luis

    Here is the correction output:

    1K75

    2i105

    3r114

    4k107

    5l108

    6a97

    7n110

    8d100

    9-8208

    10W87

    11a97

    12l108

    13l108

    14a97

    15c99

    16e101

  • bdkdavid (3/24/2016)


    Is there a cause for this behavior?

    I do not understand the source of the problem.

    Can anyone shed some light on the subject?

    The unicode character that looks like a hyphen is not, in fact, a hyphen. It's a "non-breaking hyphen" which can be created fairly easily in word processing. In a program like Word, that means that "Kirkland-Wallace" (which I typed with a hyphen, but you get the point) will not break across lines. The entire phrase will move to the next line if there isn't room at the end of the previous line for all of it.

    For your data cleansing, you need to add the non-breaking hyphen to whatever logic you are currently applying to hyphens.

  • Luis, that is an awesome query! I suspected that was the problem when I first saw this post, but it was already answered and quite well at that.

    I would really like to smack Word upside the head and disable things like smart quotes and their various hyphens. I understand and appreciate that they have a purpose, but it don't do much for a poor li'l code slinger like me.

    -----
    [font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply