Replace-Function

  • Comments posted to this topic are about the item Replace-Function

  • In my opinion this is a very bad script.
    - Scalar functions (as those) are very slow
    - creating a table variable (since you can't use #temp-tables inside a function) adds additional overhead
    - using a manual cursor (WHILE LOOP with SELECT TOP 1 and DELETE) makes it worser - even a common CURSOR over dbo.synonym table would be perform better (would not need a table variable / delete)
    - you have no sort order in your synonym table, so it is impossible to switch two characters (e.g. dot by comma and comma by dot - to do this, you would usually replace comma by e.g. Õ, than dot by comma and Õ by dot)

    If  you want to work with your dbo.synonym (instead of nested REPLACEs) you should create a inline table value function, which uses a self referencing CTE.

    God is real, unless declared integer.

  • If you want to work with your dbo.synonym (instead of nested REPLACEs) you should create a inline table value function, which uses a self referencing CTE.

    This sounds interesting, and I've got some nested Replace functions.  Can you point me to or provide an example.  My google search did not produced anything useful.  Thanks!

  • Here a version, which uses selfreferencing CTEs:

    CREATE TABLE dbo.synonym (synonym VARCHAR(50) NULL, word VARCHAR(50) NULL, sort_order INT PRIMARY KEY);
    INSERT INTO dbo.synonym (synonym, word, sort_order)
    VALUES ('hello', 'hey', 1),
       (',', 'Õ', 2),
       ('.', ',', 3),
       ('Õ', '.', 4);
    GO
    CREATE OR ALTER FUNCTION dbo.udf_ReplaceChars (@string VARCHAR(MAX))
    RETURNS TABLE
    AS
      RETURN WITH syn AS (SELECT s.synonym, s.word, ROW_NUMBER() OVER (ORDER BY s.sort_order) AS step FROM dbo.synonym AS s),
          txt AS (SELECT @string AS org_string, @string AS string, 0 AS step
            UNION ALL
            SELECT txt.org_string,
               CAST(REPLACE(txt.string, s.synonym, s.word) AS VARCHAR(MAX)) AS string,
               txt.step + 1                 AS step
             FROM txt
             INNER JOIN syn AS s
              ON txt.step + 1 = s.step
             WHERE txt.step < 100)
      SELECT TOP (1) * FROM txt ORDER BY txt.step DESC;
    GO
    -- test
    SELECT urc.org_string, urc.string FROM dbo.udf_ReplaceChars('Hello, hello, this ultimate device costs only 1,000,000.00 EUR!') AS urc;

    GO
    -- cleanup (if you don't want to use it)
    DROP FUNCTION IF EXISTS dbo.udf_ReplaceChars
    DROP TABLE IF EXISTS dbo.synonym

    Of course such a function would only make sense, when  you have to  replace a bunch of (always the same) words very often at multiple places (you'll want to add an additional column to have multiple sets of replacements in it; in this case you have to add this as a filter parameter to the function too and of course this category column have to been the first column of the PK).

    It would not make sense to fill the dbo.synonym table everytime, just because you are to dull to write a nested replace. And of course this function would still perform much worser than a nested replace because of the whole overhead with the table and "CTE-Loop" ...

    God is real, unless declared integer.

  • t.franz - Wednesday, August 15, 2018 2:45 AM

    Here a version, which uses selfreferencing CTEs:

    CREATE TABLE dbo.synonym (synonym VARCHAR(50) NULL, word VARCHAR(50) NULL, sort_order INT PRIMARY KEY);
    INSERT INTO dbo.synonym (synonym, word, sort_order)
    VALUES ('hello', 'hey', 1),
       (',', 'Õ', 2),
       ('.', ',', 3),
       ('Õ', '.', 4);
    GO
    CREATE OR ALTER FUNCTION dbo.udf_ReplaceChars (@string VARCHAR(MAX))
    RETURNS TABLE
    AS
      RETURN WITH syn AS (SELECT s.synonym, s.word, ROW_NUMBER() OVER (ORDER BY s.sort_order) AS step FROM dbo.synonym AS s),
          txt AS (SELECT @string AS org_string, @string AS string, 0 AS step
            UNION ALL
            SELECT txt.org_string,
               CAST(REPLACE(txt.string, s.synonym, s.word) AS VARCHAR(MAX)) AS string,
               txt.step + 1                 AS step
             FROM txt
             INNER JOIN syn AS s
              ON txt.step + 1 = s.step
             WHERE txt.step < 100)
      SELECT TOP (1) * FROM txt ORDER BY txt.step DESC;
    GO
    -- test
    SELECT urc.org_string, urc.string FROM dbo.udf_ReplaceChars('Hello, hello, this ultimate device costs only 1,000,000.00 EUR!') AS urc;

    GO
    -- cleanup (if you don't want to use it)
    DROP FUNCTION IF EXISTS dbo.udf_ReplaceChars
    DROP TABLE IF EXISTS dbo.synonym

    Of course such a function would only make sense, when  you have to  replace a bunch of (always the same) words very often at multiple places (you'll want to add an additional column to have multiple sets of replacements in it; in this case you have to add this as a filter parameter to the function too and of course this category column have to been the first column of the PK).

    It would not make sense to fill the dbo.synonym table everytime, just because you are to dull to write a nested replace. And of course this function would still perform much worser than a nested replace because of the whole overhead with the table and "CTE-Loop" ...

    We use the nested replaces to remove non-printable characters that sometimes get into the source data.  I'll review this in a few days and see if this could work.  My initial review makes it seem more geared to printable characters, but could be wrong.

  • RonKyle - Wednesday, August 15, 2018 7:29 AM

    We use the nested replaces to remove non-printable characters that sometimes get into the source data.  I'll review this in a few days and see if this could work.  My initial review makes it seem more geared to printable characters, but could be wrong.

    You should NOT try to replace a nested REPLACE with my (or the original) version. The nested REPLACE will always be faster.

    The only situation where I would consider to use such a function is, when the replacement pairs differ (e.g. from user inputs or because you support x different languages and sometimes it has to replace only 2 strings and in another language 20 strings)

    God is real, unless declared integer.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply