REPLACE() and NULL

  • kevin.l.williams (7/12/2010)


    ...Would it have been too difficult to return the string unmodified if there was a NULL detected? ....

    Actually, I think the answer to that would go back to the more general question of handling NULL consistently. Of course it wouldn't have been difficult to have the function do that, but would it have been the right thing to do?

    The QOD not only calls our attention to this specific behavior in the REPLACE() function, but also to the general issue of understanding and programing to handle NULL. The proposal to have REPLACE() return an unmodified string would essentially depend on treating a NULL third parameter as an empty string, which it is not.

  • john.arnott (7/12/2010)The QOD not only calls our attention to this specific behavior in the REPLACE() function, but also to the general issue of understanding and programing to handle NULL.

    I see your point. unknown in = unknown out.

    Harsh reality in some cases.

  • john.arnott (7/12/2010)


    kevin.l.williams (7/12/2010)


    Anyone got any ideas why this function was made to work this way? Makes no sense to me.

    I cannot answer that authoritatively, but consider that it would make no sense to try to replace part of a string with NULL either, so perhaps the decision was made to check the types of all three parameters first before doing the matching.

    --

    edit: fix typo

    I guess my question is why does the REPLACE function replace anything if the string pattern (second parameter) isn't in the string expression (first parameter)?

    For example, this doesn't return 'bbb':

    SELECT REPLACE('Hello world', 'zzz', 'bbb');

    So why should

    SELECT REPLACE('Hello world', 'zzz', NULL);

    return NULL?

    Thanks in advance for any help,

    webrunner

    -------------------
    A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
    Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html

  • Good question, and a good reminder. Thanks.

    Tom Garth
    Vertical Solutions[/url]

    "There are three kinds of men. The one that learns by reading. The few who learn by observation. The rest of them have to pee on the electric fence for themselves." -- Will Rogers
  • Nice question. I only knew it because I've run into it before.

    ---------------------------------------------------------------------
    Use Full Links:
    KB Article from Microsoft on how to ask a question on a Forum

  • webrunner (7/12/2010)


    I guess my question is why does the REPLACE function replace anything if the string pattern (second parameter) isn't in the string expression (first parameter)?

    It doesn't. It checks for NULL before doing any replacing. This is because of the special quality of NULL. As posted before, unknown in -> unknown out.

    The problem comes because we are used to thinking of replace as two operations:

    1. Look for a search string within the given string.

    2. If found, replace with replacement string.

    The conditional there allows us to ignore an unknown or invalid third parameter if there's no replacement to be made.

    But that's procedural thinking. Declarative thinking would phrase it more like* "The given string with replacement string in place of all instances of search string".

    There's no conditional, so all inputs have to be considered in the function. Consequently, since one of the inputs is unknown, the result should be unknown as well.

    * or some such; I'm certainly not an expert on declarative syntax.

  • I personally think the reason the function returns null is because of NULL concatenation math. I think the underlying code uses loops and concatentates the string piece by piece similar to the construct below. Because an empty string has no length the code cant iterate through the string correctly, thus the returned value is NULL. Even if you could determine the length of the string, the NULL concatenation would make the entire string NULL anyway.

    I think it looks something like this under the hood. (Obviously the construct is not TSQL, but the construct should be logically similar to this)

    DECLARE @i INT,

    @LenStr INT,

    @STR VARCHAR(8000),

    @Search VARCHAR(8000),

    @ReplaceWith VARCHAR(8000),

    @NewStr VARCHAR(8000)

    SET @STR = 'Hello World ->>/ this is coolz World<<-'

    SET @LenStr = DATALENGTH(@Str)

    SET @Search = 'World'

    SET @ReplaceWith = 'Worldz'--change to null to see what happens

    SET @i = 1

    SET @NewStr = ''

    WHILE @i < @LenStr+1

    BEGIN

    IF SUBSTRING(@Str,@i,DATALENGTH(@Search)) = @Search

    BEGIN

    SET @NewStr = @NewStr + @ReplaceWith

    SET @i = @i + DATALENGTH(@Search)

    END

    ELSE

    BEGIN

    SET @NewStr = @NewStr + SUBSTRING(@Str,@i,1)

    SET @i = @i + 1

    END

    END

    SELECT @NewStr

  • sknox (7/12/2010)


    webrunner (7/12/2010)


    I guess my question is why does the REPLACE function replace anything if the string pattern (second parameter) isn't in the string expression (first parameter)?

    It doesn't. It checks for NULL before doing any replacing. This is because of the special quality of NULL. As posted before, unknown in -> unknown out.

    The problem comes because we are used to thinking of replace as two operations:

    1. Look for a search string within the given string.

    2. If found, replace with replacement string.

    The conditional there allows us to ignore an unknown or invalid third parameter if there's no replacement to be made.

    But that's procedural thinking. Declarative thinking would phrase it more like* "The given string with replacement string in place of all instances of search string".

    There's no conditional, so all inputs have to be considered in the function. Consequently, since one of the inputs is unknown, the result should be unknown as well.

    * or some such; I'm certainly not an expert on declarative syntax.

    Thanks. I think this code illustrates your explanation, if I understand correctly, that if any parameter passed in is NULL, NULL has to be returned for all rows because unknown in -> unknown out.

    -- Create a temp table to store the values.

    CREATE TABLE #NullTest (pk int, val varchar(15))

    -- Insert the values, including one row with NULL.

    INSERT INTO #NullTest VALUES (1, 'aaa');

    INSERT INTO #NullTest VALUES (2, NULL);

    INSERT INTO #NullTest VALUES (3, 'bbb');

    INSERT INTO #NullTest VALUES (4, 'ccc');

    -- SELECT 1

    SELECT pk, val FROM #NullTest

    -- SELECT 2

    SELECT pk, REPLACE(val, 'zzz', 'ddd')

    FROM #NullTest

    -- SELECT 3

    SELECT pk, REPLACE(val, 'zzz', NULL)

    FROM #NullTest

    -- SELECT 4

    SELECT pk, REPLACE(val, 'zzz', NULL)

    FROM #NullTest

    WHERE val IS NOT NULL

    -- Drop the temp table.

    DROP TABLE #NullTest

    In fact, SELECTs 3 and 4 return all NULLs even if there are no rows with a NULL value for the val column. Change NULL to 'eee' in the INSERT statement for pk = 2, for example.

    What's funny is that even if you filter out the NULL values (as in SELECT 4 above), you still get all NULLs, except in the example above you get 3 NULL rows instead of 4.

    Seems like overkill, but honestly I don't understand it 100% so I can't say that doing it any other way would be better.

    Thanks again,

    webrunner

    -------------------
    A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
    Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html

  • Adam Haines (7/12/2010)


    I personally think the reason the function returns null is because of NULL concatenation math. I think the underlying code uses loops and concatentates the string piece by piece similar to the construct below. Because an empty string has no length the code cant iterate through the string correctly, thus the returned value is NULL. Even if you could determine the length of the string, the NULL concatenation would make the entire string NULL anyway.

    But the REPLACE works just fine with empty strings -- those that have zero length. SQL does not consider strings with NULL value as the same thing as empty strings.

    Declare @SearchMe VarChar(50)

    Declare @FindMe VarChar(50)

    Declare @UseMeInstead VarChar(50)

    Set @SearchMe = 'This is a simple sentence.'

    Set @FindMe = 'simple'

    Set @UseMeInstead = 'dull'

    Select #1NormalReplace = Replace(@SearchMe, @FindMe, @UseMeInstead)

    Set @SearchMe = 'This is a simple sentence.'

    Set @FindMe = 'simple '

    Set @UseMeInstead = ''

    Select #2ReplaceWithEmpty = Replace(@SearchMe, @FindMe, @UseMeInstead)

    Set @SearchMe = ''

    Set @FindMe = 'simple '

    Set @UseMeInstead = 'non-existent'

    Select #3SearchTheEmptyString = Replace(@SearchMe, @FindMe, @UseMeInstead)

    Set @SearchMe = 'This is a simple sentence.'

    Set @FindMe = ''

    Set @UseMeInstead = 'will not be used'

    Select #4FindEmpty = Replace(@SearchMe, @FindMe, @UseMeInstead)

    Results:

    #1NormalReplace

    --------------------------

    This is a dull sentence.

    (1 row(s) affected)

    #2ReplaceWithEmpty

    --------------------------

    This is a sentence.

    (1 row(s) affected)

    #3SearchTheEmptyString

    --------------------------

    (1 row(s) affected)

    #4FindEmpty

    --------------------------

    This is a simple sentence.

    (1 row(s) affected)

  • Correct, so my assumption is the code has some control flow logic checks that make the loop exit early if NULL values are detected, but it doesnt make sense to go through the motions, if the end result is going to be NULL.

    I suspect something like this is in place.

    WHILE @i < @LenStr+1 AND DATALENGTH(@Search) > 0 AND DATALENGTH(@ReplaceWith) > 0

    My main point was that even if you were to get past the NULL parameters it would do you no good because NULL cannot concenate with strings to produce a value.

  • Good question, and brings up something I haven't run into before. Thanks!

  • Yet another example of NULL causing unexpected results in a query!

    (I would say we should stop using NULL but I know we're not supposed to suggest that ;-))

    The lesson is, once again, Beware of NULL.

  • Use of the below could be argued as an acceptable solution.

    if (@SearchMe is not null and @FindMe is not null and @ReplaceMe is not null) Select #1NormalReplace = Replace(@SearchMe, @FindMe, @ReplaceMe)

    We have all tried to explain that NULL does not equal 0 or NULL to an end user and often the response I've gotten is "That's STUPID!" This raises the question, What does null really mean? It is more involved than Schrödinger's cat delima. Null brings into question is there a cat or radioactive isotope in the box. My spin is that NULL means "I don't know", or that it has not intentionally been assigned a value by the system or a user. A value of 0 or empty string can be inferred as an intentional assignment. Often we may want to establish a starting point (e.g. shopping cart total). Here a default should be used and not allow nulls. Other times you may have a Yes/No question. Do you use a default? By using a default you have forced an answer to all questions. In fact the answer to the question may be "I don't know" or "I abstain".

    In my opinion NULLs serve a purpose. They are the love child of two people arguing semantics.

  • timothy.bates (7/23/2010)


    We have all tried to explain that NULL does not equal 0 or NULL to an end user and often the response I've gotten is "That's STUPID!" This raises the question, What does null really mean? It is more involved than Schrödinger's cat delima. Null brings into question is there a cat or radioactive isotope in the box. My spin is that NULL means "I don't know", or that it has not intentionally been assigned a value by the system or a user. A value of 0 or empty string can be inferred as an intentional assignment. Often we may want to establish a starting point (e.g. shopping cart total). Here a default should be used and not allow nulls. Other times you may have a Yes/No question. Do you use a default? By using a default you have forced an answer to all questions. In fact the answer to the question may be "I don't know" or "I abstain".

    In my opinion NULLs serve a purpose. They are the love child of two people arguing semantics.

    Null in SQL means only one thing: the absence of a value. In an execution context, local variables can fail to have a value (eg when being assigned in a select statement, or when they have been declared but no value has ever been assigned) - if they do, they are null. NULL occurs in a table when there is no value in the database for this column and row. It doesn't of course indicate why no value is present - just that none is.

    In particular it doesn't mean that the value is unknown - it may be known but bad code failed to enter it when it should, it may be known but there hasn't been time to enter it yet, it may be known but secret and withheld from the database, it may be be that the attribute is not present because for this row it doesn't exist (eg wife's maiden name for a bachelor, that is inapplicable as opposed to unknown - although in SQL, which doesn't have a separate NULL to cater for this case, it is usually bad practise to have a schema with columns that are sometimes inapplicable), and so on.

    It is unfortunate that the ANSI and ISO people persist in using a definition of NULL that claims that it is a value, when it is not a value but rather an indicator that no value is present. I think that that definition has caused some of the confusion that surrounds NULL. It's even more of a pity that some DBMS suppliers (including Microsoft) continue to treat NULLs in expressions by default as values equivalent, depending on context, to a 0 integer or zero length string or whatever seems to fit in the value domain; this has caused most of the confusion over NULL; the one cheering thing on that front is that soon the present default treatment of NULL in expressions in SQL Server will cease to be the default and won't even be able to be selected, only the proper treatment of NULL will be possible.

    Reasonable arguments can be made either way for how REPLACE ought to operate when there second parameter matches nothing in the first and the third parameter is NULL. "NULL in means NULL out" (as seen in several earlier comments) isn't one of those arguments - operators like COALESCE and ISNULL are obvious counterexamples to that argument. Personally I like the simplicity of the current behaviour but dislike the hoops I sometimes have to jump through to programme around it - but despite those hoops (and some severe doubts about the complications the current behavious causes to any functional semantics for T-SQL) I think the current behaviour is probably best - KISS usually wins.

    Tom

Viewing 14 posts - 16 through 28 (of 28 total)

You must be logged in to reply to this topic. Login to reply