Searching for case sensitive data

  • Hi

    The answer depends on your future requirements. If it is just a temporary failure you can use COLLATE within the WHERE clause to specify the column as case-sensitive. If this is/becomes a standard requirement you should set the collation of the column to case-sensitive.

    Here a little sample for both approaches:

    DECLARE @t TABLE

    (

    Id INT,

    CS_text VARCHAR(30) COLLATE SQL_Latin1_General_CP1_CS_AS,

    CI_text VARCHAR(30)

    )

    INSERT INTO @t

    SELECT 1, 'hello', 'world'

    UNION ALL SELECT 2, 'Hello', 'World'

    SELECT *

    FROM @t

    WHERE CS_text = 'Hello'

    SELECT *

    FROM @t

    WHERE CI_text COLLATE SQL_Latin1_General_CP1_CS_AS = 'World'

    Greets

    Flo

  • Data in co_num column of customer table shows:

    CQS0002509

    CQS0002510

    CQS0002511

    cqs0002512

    cqs0002513

    cqs0002514

    cqs0002515

    CQS0002516

    CQS0002517

    CQS0002518

    CQS0002519

    CQS0002520

    Tried the select statement as follows:

    SELECT *

    FROM co

    WHERE co_num COLLATE SQL_Latin1_General_CP1_CS_AS like 'cqs%'

    Results:

    co_num

    ----------

    cqs0002512

    cqs0002513

    cqs0002514

    cqs0002515

    which is exactly what I want πŸ™‚

    Now I need to fix the data using the following Update command

    update co

    set co_num = 'cqs0002515'

    where co_num = 'CQS0002515'

    Is there a way to update a group of records such as all "cqs%" to become "CQS%"?

  • Hi

    What about UPPER? πŸ˜‰

    DECLARE @t TABLE

    (

    txt VARCHAR(30)

    )

    INSERT INTO @T

    SELECT 'CQS0002509'

    UNION ALL SELECT 'CQS0002510'

    UNION ALL SELECT 'CQS0002511'

    UNION ALL SELECT 'cqs0002512'

    UNION ALL SELECT 'cqs0002513'

    UNION ALL SELECT 'cqs0002514'

    UNION ALL SELECT 'cqs0002515'

    UNION ALL SELECT 'CQS0002516'

    UNION ALL SELECT 'CQS0002517'

    UNION ALL SELECT 'CQS0002518'

    UNION ALL SELECT 'CQS0002519'

    UNION ALL SELECT 'CQS0002520'

    UPDATE t SET txt = UPPER(txt)

    FROM @t t

    WHERE txt COLLATE SQL_Latin1_General_CP1_CS_AS LIKE 'cqs%'

    SELECT * FROM @t

    Greets

    Flo

  • Is there a way to update a group of records such as all "cqs%" to become "CQS%"?

    Keeping it simple....

    UPDATE sometable

    SET someColumn = REPLACE(someColumn,'cqs','CQS')

    WHERE somecolumn LIKE 'cqs%'

    Sorry to butt in, Flo, but you're answering ALL the questions!! πŸ˜›

    Now for the time trial results. You won the first round because of the COLLATE in the WHERE clause. That was shrewd because it changed ONLY those rows with a lower case 'cqs'. I would still avoid doing that as a matter of practice, because there might be some mixed case characters lurking in there as well.

    However, it seems that the REPLACE function is about 10% faster than the UPPER function. I assume that it was quicker to test every character to see if it was a 'c' (and replace only three) faster than it was to run the UPPER conversion on every character.

    By the way, my compliments on the Tally thread. It has been fascinating to follow. I haven't thought of any alternatives worth adding to the thread, but it has been thought provoking and educational all the same. I haven't seen such substantive debate going on in any forum outside SSC. Congratulations! πŸ™‚

    ;with Tally (N) AS (SELECT TOP 10000 ROW_NUMBER() over (order by sc1.id) FROM Master.dbo.SysColumns sc1)

    select 'cqs'+LEFT('00000'+CAST(N as varchar(10)),10) as keyno

    into #testTable

    from tally

    create unique clustered index #pk_TestTable on #testTable ( keyno )

    update #testTable

    set Keyno = UPPER(keyNo)

    where keyNo like '%1%'

    set statistics time on;

    print '--Flo'

    UPDATE #testTable

    SET keyno = UPPER(keyNo)

    FROM #testTable t

    WHERE keyNo COLLATE SQL_Latin1_General_CP1_CS_AS LIKE 'cqs%'

    set statistics time off;

    -- reset

    update #testTable

    set Keyno = lower(keyNo)

    where keyNo not like '%1%'

    set statistics time on;

    print '--Bob'

    UPDATE #testTable

    SET keyno = REPLACE(keyNo,'cqs','CQS')

    WHERE keyNo COLLATE SQL_Latin1_General_CP1_CS_AS LIKE 'cqs%'

    set statistics time off;

    drop table #testTable

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Hi Bob!

    Bob Hovious (4/15/2009)


    Is there a way to update a group of records such as all "cqs%" to become "CQS%"?

    Keeping it simple....

    UPDATE sometable

    SET someColumn = REPLACE(someColumn,'cqs','CQS')

    WHERE somecolumn LIKE 'cqs%'

    Now for the time trial results. You won the first round because of the COLLATE in the WHERE clause. That was shrewd because it changed ONLY those rows with a lower case 'cqs'. I would still avoid doing that as a matter of practice, because there might be some mixed case characters lurking in there as well.

    However, it seems that the REPLACE function is about 10% faster than the UPPER function. I assume that it was quicker to test every character to see if it was a 'c' (and replace only three) faster than it was to run the UPPER conversion on every character.

    Just for fun, a different solution. The test results a bit strange. Sometimes the STUFF seems to be faster than the REPLACE and sometimes other way around. I also tested 'CQS' + SUBSTRING... but seems to be slower. Same with RIGHT...

    --- reset

    update #testTable

    set Keyno = lower(keyNo)

    where keyNo not like '%1%'

    set statistics time on;

    print '--Flo 2'

    UPDATE #testTable

    SET keyno = STUFF(keyNo, 1, 3, 'CQS')

    WHERE keyNo COLLATE SQL_Latin1_General_CP1_CS_AS LIKE 'cqs%'

    set statistics time off;

    Sorry to butt in, Flo, but you're answering ALL the questions!! πŸ˜›

    πŸ˜€

    Sorry for that, I've not been online the most of the day and just have been a bore.

    πŸ˜€

    By the way, my compliments on the Tally thread. It has been fascinating to follow. I haven't thought of any alternatives worth adding to the thread, but it has been thought provoking and educational all the same. I haven't seen such substantive debate going on in any forum outside SSC. Congratulations! πŸ™‚

    Yes, it has been a really great discussion! I never saw so much input by so many professionals in any other thread. I'm still doin' some many tests, also for some other CLR approaches. When I'm done I'll share it with you!

    Greets

    Flo

  • Just share it with the entire thread and I'll read it.

    Like Jeff, I'm now thinking that I will have to learn to code CLR in self defense. πŸ˜‰

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Thanks Flo and Bob:

    The Collate statement allowed me to find the incorrect string and the Update query allowed me to correct the data

    UPDATE sometable

    SET someColumn = REPLACE(someColumn,'cqs','CQS')

    WHERE somecolumn LIKE 'cqs%'

    Thanks again:-)

  • You're welcome, Ron. Hope we didn't overwhelm you with detail. πŸ˜‰

    These sidebar discussions often go into way more minutiae than is required to solve your problem, but sometimes they dredge up some really good information. For one-shot fixes, you may not care whether you fix it with UPPER, REPLACE, STUFF, 'CQS'+SUBSTRING(), or 'CQS'+RIGHT(). But when you get to coding transactions with high volumes, it might be a good thing to know which functions are easier on the CPU. Best of luck to you.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Just a side note. If case is that important to the application then, in my opinion, the application should be upper-casing the string in the first place. Yeah you can do it in the SQL if you need to, but that should be part of the validation.

  • By the way, my compliments on the Tally thread. It has been fascinating to follow. I haven't thought of any alternatives worth adding to the thread, but it has been thought provoking and educational all the same. I haven't seen such substantive debate going on in any forum outside SSC. Congratulations!

    Do you mind posting the link to this thread. I must have missed it.

    Thanks.

    ---------------------------------------------
    [font="Verdana"]Nothing is impossible.
    It is just a matter of time and money.[/font]

  • Somehow users managed to input some of the customer order numbers as lower case

    Looks like they tried and failed, Jack. The counter-argument would be that if that column used an uppercase-only collation (or added UPPER to all procs that do inserts), then it wouldn't matter if the application code was broken. I know scrubbing is more of a load on the SQL server, but enforcing data integrity is a primary role of a database.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Do you mind posting the link to this thread. I must have missed it.

    Here you go. Flo stirred up quite a debate by identifying a situation where RBAR using CLR was outperforming the set-based solution.

    Performance issue with tally solution

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Thanks a bunch.

    ---------------------------------------------
    [font="Verdana"]Nothing is impossible.
    It is just a matter of time and money.[/font]

  • Jack Corbett (4/16/2009)


    Just a side note. If case is that important to the application then, in my opinion, the application should be upper-casing the string in the first place. Yeah you can do it in the SQL if you need to, but that should be part of the validation.

    Never thought I would ever have a different opinion than you, but never say never.

    I strongly advise to move this task to the DB side. Thats why we have constraints within the DB as well. We don't trust the application by definition πŸ˜€

    (It could also be that I misunderstood you and you meant to only validate the UPPER case in the DB, but UPPER the data itself in the application)

    Best Regards,

    Chris BΓΌttner

  • Christian Buettner (4/16/2009)


    Jack Corbett (4/16/2009)


    Just a side note. If case is that important to the application then, in my opinion, the application should be upper-casing the string in the first place. Yeah you can do it in the SQL if you need to, but that should be part of the validation.

    Never thought I would ever have a different opinion than you, but never say never.

    I strongly advise to move this task to the DB side. Thats why we have constraints within the DB as well. We don't trust the application by definition πŸ˜€

    (It could also be that I misunderstood you and you meant to only validate the UPPER case in the DB, but UPPER the data itself in the application)

    Bob Hovious (4/16/2009)


    Somehow users managed to input some of the customer order numbers as lower case

    Looks like they tried and failed, Jack. The counter-argument would be that if that column used an uppercase-only collation (or added UPPER to all procs that do inserts), then it wouldn't matter if the application code was broken. I know scrubbing is more of a load on the SQL server, but enforcing data integrity is a primary role of a database.

    I figured I'd hit both of these in one post.

    Christian,

    I'm not sure we disagree. The point I was trying to make, and obviously not clearly, is that a business rule should be enforced in the application so that you do not "waste" a round-trip to the DB with invalid data. Would I want to put some kind of validation in the database, too? Yes. I'd probably use UPPER in my insert/update procedures. If you aren't using stored procedures, but an ORM tool or ad-hoc SQL, you'd have to use an INSTEAD OF trigger in the database to make sure the data was inserted properly and apply the UPPER, otherwise it still has to be done in the application. Ideally you would use a case-sensitive collation for the column and a check constraint.

    Bob,

    I missed the part of the post you quoted, which causes us to assume that the application IS supposed to be validating the data. As I said above, that doesn't mean I wouldn't validate in the database as well, mainly to protect the database from some ad hoc update by me or another DBA. Is there an upper case only collation? I know you can do case-sensitive, but didn't know you could require one case.

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply