Scalar function string replacement

Question

Post reply

Scalar function string replacement

Steve Collins

SSCrazy

Points: 2227
More actions
April 18, 2022 at 12:00 am

#4011744

Comments posted to this topic are about the item Scalar function string replacement
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Viewing 12 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

IndyMike SSC Enthusiast Points: 168 More actions · Answer 1

That's really very clever. I can see making some code much cleaner using this instead of nested REPLACE operators.

Michael L John One Orange Chip Points: 27079 More actions · Answer 2

IndyMike wrote:

That's really very clever. I can see making some code much cleaner using this instead of nested REPLACE operators.

How?

Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/

Mr. Brian Gale SSC-Insane Points: 24875 More actions · Answer 3

This feels almost like an undocumented "feature" (similar to quirky updates)... I know I would be a bit hesitant in how I used this and would probably continue to stick with my nested REPLACE's as I know how those work today and in the future.

But I did learn something as I expected a different answer on this one than was the actual answer! Hopefully it is consistent across all SQL versions (tested it on 2017 and it worked as described). Would suck to implement this only to find the behavior being different on different versions though or having it change in a future version.

The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

Steve Collins SSCrazy Points: 2227 More actions Follow · Answer 4

Michael L John wrote:

IndyMike wrote:
That's really very clever. I can see making some code much cleaner using this instead of nested REPLACE operators.
How?

The way I came across this functionality was as a way to remove unacceptable words from user entered content. Some words should not be written down or shared because it's just filth. The replacements done are in descending string length order so first 'a--hole' and then 'a--' etc.

drop function if exists dbo.scalar_string_replace;
go
create function dbo.scalar_string_replace(
  @stringnvarchar(max))
returns nvarchar(max) as 
begin
    select @string = replace(@string, v.str_key, v.str_val)
    from (values (N'idiotic', N'not a good decision maker'),
                 (N'idiot', N'fool'),
                 (N'fool', N'foot stool')) v(str_key, str_val);

    return @string;
end
go

select dbo.scalar_string_replace(N'You are idiotic and a fool');

(No column name)
You are not a good decision maker and a foot stool

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Jeff Moden SSC Guru Points: 1004619 More actions Follow · Answer 5

Mr. Brian Gale wrote:

This feels almost like an undocumented "feature" (similar to quirky updates)... I know I would be a bit hesitant in how I used this and would probably continue to stick with my nested REPLACE's as I know how those work today and in the future.
But I did learn something as I expected a different answer on this one than was the actual answer! Hopefully it is consistent across all SQL versions (tested it on 2017 and it worked as described). Would suck to implement this only to find the behavior being different on different versions though or having it change in a future version.

It's not a quirk although it's not specifically documented in the MS documentation. It's sometimes referred to as "overloading a variable" or just "overloading". This is one of the things that T-SQL does very well where Oracle will give you an error.

I use this method at work to "normalize" addresses according to postal standards in-so-far as abbreviations go and it's nasty fast for what it does even though it's used in a Scalar Function especially if you use a binary collation.

Also, and to clarify, the "Nested Replace" method is even faster and can be used in iTVFs. The reason why I used it in a Scalar Function was to use it in a persisted computed column in the days before I understood that using Scalar Functions in computed columns causes any and all queries to go single threaded even if they don't refer to the column. See Brent Ozar's article on that subject, which I've individually confirmed as being true and am in the process of replacing.

https://www.brentozar.com/archive/2020/10/using-triggers-to-replace-scalar-udfs-on-computed-columns/

It also turns out to be an even bigger problem in other areas, as well... like parallel index rebuilds and DBCC CHECKDB. See Erik Darling's post on that.

https://www.brentozar.com/archive/2016/01/another-reason-why-scalar-functions-in-computed-columns-is-a-bad-idea/

And the "recently patched" thing Eric is talking about is for 2012 and 2014 where online rebuilds could sometime cause corruption of your clustered indexes. Here's the info on that.

https://support.microsoft.com/en-us/topic/fix-data-corruption-occurs-in-clustered-index-when-you-run-online-index-rebuild-in-sql-server-2012-or-sql-server-2014-075b0c04-1260-1716-6b9d-737ada4f8686

And people wonder why migrating to 2019 has me concerned even though it's been out for 3 years.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jason A. Long SSC-Insane Points: 23790 More actions · Answer 6

It's a good solution as long as you recognize that you're creating an internal Cartesian product between rows in the table value constructor (tvc) and the rows in the outer query. Depending on the number of rows in the outer query and the number of replace values in the tvc, this method may perform poorly compared to other solutions.

The fact that the function is written as a scalar will mask this fact in the execution plan.

If you're using SQL Server 2017 or later, the TRANSLATE() function would be a better option.

This reply was modified 3 years, 3 months ago by Jason A. Long.

Jeff Moden SSC Guru Points: 1004619 More actions Follow · Answer 7

Jason A. Long wrote:

It's a good solution as long as you recognize that you're creating an internal Cartesian product between rows in the table value constructor (tvc) and the rows in the outer query. Depending on the number of rows in the outer query and the number of replace values in the tvc, this method may perform poorly compared to other solutions.
The fact that the function is written as a scalar will mask this fact in the execution plan.
If you're using SQL Server 2017 or later, the TRANSLATE() function would be a better option.

Remember that things like DELIMITEDSPLIT8K() and fnTally() also create such Cartesian products not only externally but internally. ')

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004619 More actions Follow · Answer 8

@steve-2 Collins,

I also mean to say, GREAT question!

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

IndyMike SSC Enthusiast Points: 168 More actions · Answer 9

Literally just the way it's presented in the question. Either with a scalar function, or in a longer script with a pre-built table of replacements.

Jeff Moden SSC Guru Points: 1004619 More actions Follow · Answer 10

Jason A. Long wrote:

If you're using SQL Server 2017 or later, the TRANSLATE() function would be a better option.

I missed this. Considering what they did to the FORMAT function for performance, have you or do you know of anyone that has done a performance test?

The other thing is that, while TRANSLATE and REPLACE do have a bit of an intersection in individual character replacement, TRANSLATE doesn't seem to have the ability to replace combinations of letters as in words, etc.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Steve Collins SSCrazy Points: 2227 More actions Follow · Answer 11

Jeff Moden wrote:

Mr. Brian Gale wrote:
This feels almost like an undocumented "feature" (similar to quirky updates)... I know I would be a bit hesitant in how I used this and would probably continue to stick with my nested REPLACE's as I know how those work today and in the future.
But I did learn something as I expected a different answer on this one than was the actual answer! Hopefully it is consistent across all SQL versions (tested it on 2017 and it worked as described). Would suck to implement this only to find the behavior being different on different versions though or having it change in a future version.
It's not a quirk although it's not specifically documented in the MS documentation.

It's definitely not documented well. I recall reading a blog post about it years ago but can't find it now. It must be intentional functionality because the replacements happen in reverse string length order. So replacements of short fragments doesn't mess up replacements of full compound words, i.e. 'ass' , 'asshat', and 'asshats' can each be replaced without messing up the longer word with too early of a replacement. In the case of profanity we replaced the strings with empty ''

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Scalar function string replacement

Cookies on SQLServerCentral