A quick query puzzle:

  • Hi

    I have a table with below data. Requirement is to replace all integers with continous 6 or more occurances with 'x'. Less than 6 occurances should not be replaced.

    create table t1(name varchar (100))

    GO

    INsert into t1

    select '1234ABC123456XYZ1234567890ADS'

    GO

    INsert into t1

    select 'cbv736456XYZ543534534545XLS'

    GO

    EXPECTED RESULT:

    1234ABCxxxxxxXYZxxxxxxxxxxADS

    cbvxxxxxxXYZxxxxxxxxxxxxXLS

    drop table t1

    Online Trainer For SQL DBA and Developer @RedBushTechnologies with 18 yrs exp.

  • Hi

    You could try the following

    CREATE FUNCTION replace6plusint(@s1 varchar(100)) RETURNS TABLE AS RETURN

    WITH

    firstRun AS (

    SELECT replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(@s1,'0','^'),'1','^'),'2','^'),'3','^'),'4','^'),'5','^'),'6','^'),'7','^'),'8','^'),'9','^'),'^^^^^^','------') a

    UNION ALL

    SELECT replace(a,'-^','--')

    FROM firstRun

    WHERE a like '%-^%'

    ),

    secondRun AS (

    SELECT replace(a,'-','x') b, charindex('^', a) p

    FROM firstRun

    WHERE a not like '%-^%'

    UNION ALL

    SELECT stuff(b, p, 1, substring(@s1,p,1)), charindex('^', b)

    FROM secondRun

    WHERE p <> 0

    )

    SELECT b result FROM secondRun WHERE p = 0

  • This is really good but problem is that my table has multiple rows and this is a table function. So it has to be a scalar function for me to use....

    Online Trainer For SQL DBA and Developer @RedBushTechnologies with 18 yrs exp.

  • S_Kumar_S (10/5/2012)


    This is really good but problem is that my table has multiple rows and this is a table function. So it has to be a scalar function for me to use....

    Find some good reading material covering functions. You absolutely don't need a function for this - and most folks would recommend a table-valued function over the alternatives. It's far simpler than you think:

    SELECT name,

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(name,

    '012345','xxxxxx'),

    '123456','xxxxxx'),

    '234567','xxxxxx'),

    '345678','xxxxxx'),

    '456789','xxxxxx')

    FROM t1

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Hi Chris.

    the numbers are not consecutive as you have assumed. And they may be of length greater than 6. I think it is not so straight as it looks.

    Online Trainer For SQL DBA and Developer @RedBushTechnologies with 18 yrs exp.

  • With a large amount of replaces this can be done.

    Problem is that the number of replaces is very large.

    So I'll show the concept but not the code, because the code would be large.

    First of all I assum that there are 'strings' which do not occure in the supplied string.

    Here I'll use the strings '^' and 'x', but if these occure replace then with strings which do not occure.

    First replace alle the numbers in the strings with

    1

  • S_Kumar_S (10/5/2012)


    Hi Chris.

    the numbers are not consecutive as you have assumed. And they may be of length greater than 6. I think it is not so straight as it looks.

    Ah, ok.

    SELECT

    t.name, x.New_Name

    FROM #t1 t

    CROSS APPLY (

    SELECT New_Name =

    (SELECT Newletter + ''

    FROM (

    SELECT n, Newletter = CASE WHEN d.grouper IS NOT NULL

    AND COUNT(*) OVER (PARTITION BY d.grouper) > 5 THEN 'x' ELSE letter END

    FROM (

    SELECT n, letter,

    grouper = ROW_NUMBER() OVER(PARTITION BY t.name ORDER BY ISNUMERIC(letter), n) - N

    FROM (SELECT TOP(LEN(t.name)) n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))

    FROM sys.columns a, sys.columns b) tally

    CROSS APPLY (SELECT letter = SUBSTRING(t.name,n,1)) l

    ) d

    ) stri

    ORDER BY n

    FOR XML PATH('') , TYPE).value('.', 'varchar(max)')

    ) x

    ORDER BY t.name

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • With a large amount of replaces this can be done.

    Problem is that the number of replaces is very large.

    So I'll show the concept but not the code, because the code would be large.

    First of all I assume that there are 'strings' which do not occure in the supplied string.

    Here I'll use the strings '^' and 'x', but if these occure replace then with strings which do not occure.

    First replace alle the numbers in the strings with

    1 becomes ^1 (do this for all 10 numeric characters)

    ^1^ becomes ^^1 (do this for all 10 numeric characters and repeat this large number of times).

    All number strings have now the shape ^^^^1234

    X^^^^^^ becomes x^^^^^

    EDIT: ABOVE LINE SHOULD BE (sorry)

    ^^^^^^ becomes x^^^^^

    x^ becomes xx (repeat a number of times)

    Now the number strings all have the shape ^^^^1234 or xxxxxxxx12345678

    x1 becomes x (do this for all 10 numeric characters and repeat this a large number of) times).

    ^ becomes '' (remove all the ^)

    The number of replaces is large, but maybe could be done in a loops.

    Or if the database is very large this can be generated.

    If the database is large and the numbers are very long you need an awfull lot of replaces.

    For large numbers and large number of rows this might not be a handy method.

    An alternate solution might be to copy the table change all numbers into '1' do the above with far less substitutes and than replace the 1111 string with the original numbers. With some masking this is possible.

    thanks for sharing your problem,

    Ben Brugman

  • This is quite impressive but it doesn't work for some scenarios. e.g. this one :

    insert into t1

    select 'AAA22222222222222BB5BB 33CCCCCCCCC342234234'

    EDIT: I meant it for Chris response:

    SELECT

    t.name, x.New_Name

    FROM #t1 t

    CROSS APPLY (

    SELECT New_Name =

    (SELECT Newletter + ''

    FROM (

    SELECT n, Newletter = CASE WHEN d.grouper IS NOT NULL

    AND COUNT(*) OVER (PARTITION BY d.grouper) > 5 THEN 'x' ELSE letter END

    FROM (

    SELECT n, letter,

    grouper = ROW_NUMBER() OVER(PARTITION BY t.name ORDER BY ISNUMERIC(letter), n) - N

    FROM (SELECT TOP(LEN(t.name)) n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))

    FROM sys.columns a, sys.columns b) tally

    CROSS APPLY (SELECT letter = SUBSTRING(t.name,n,1)) l

    ) d

    ) stri

    ORDER BY n

    FOR XML PATH('') , TYPE).value('.', 'varchar(max)')

    ) x

    ORDER BY t.name

    Online Trainer For SQL DBA and Developer @RedBushTechnologies with 18 yrs exp.

  • S_Kumar_S (10/5/2012)


    This is quite impressive but it doesn't work for some scenarios. e.g. this one :

    insert into t1

    select 'AAA22222222222222BB5BB 33CCCCCCCCC342234234'

    EDIT: I meant it for Chris response:

    SELECT

    t.name, x.New_Name

    FROM #t1 t

    CROSS APPLY (

    SELECT New_Name =

    (SELECT Newletter + ''

    FROM (

    SELECT n, Newletter = CASE WHEN d.grouper IS NOT NULL

    AND COUNT(*) OVER (PARTITION BY d.grouper) > 5 THEN 'x' ELSE letter END

    FROM (

    SELECT n, letter,

    grouper = ROW_NUMBER() OVER(PARTITION BY t.name ORDER BY ISNUMERIC(letter), n) - N

    FROM (SELECT TOP(LEN(t.name)) n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))

    FROM sys.columns a, sys.columns b) tally

    CROSS APPLY (SELECT letter = SUBSTRING(t.name,n,1)) l

    ) d

    ) stri

    ORDER BY n

    FOR XML PATH('') , TYPE).value('.', 'varchar(max)')

    ) x

    ORDER BY t.name

    My apologies, it was a change during coding:

    SELECT

    t.name, x.New_Name

    FROM #t1 t

    CROSS APPLY (

    SELECT New_Name =

    (SELECT Newletter + ''

    FROM (

    SELECT

    n, Newletter = CASE WHEN ISNUMERIC(letter) = 1 -- d.grouper IS NOT NULL

    AND COUNT(*) OVER (PARTITION BY d.grouper) > 5 THEN 'x' ELSE letter END

    FROM (

    SELECT t.name, n, letter,

    grouper = n - ROW_NUMBER() OVER(ORDER BY ISNUMERIC(letter) desc, n)

    FROM

    (SELECT TOP(LEN(t.name)) n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))

    FROM sys.columns a, sys.columns b) tally

    CROSS APPLY (SELECT letter = SUBSTRING(t.name,n,1)) l

    ) d

    ) stri

    ORDER BY n

    FOR XML PATH('') , TYPE).value('.', 'varchar(max)')

    ) x

    ORDER BY t.name

    Edit: found an error in an extended sample data set.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Hello all,

    Because I goofed a bit in my previous solution here the worked out code.

    It's a solution which only used simple replaces, not efficient (lots of updates).

    ben brugman

    -- Ben Brugman

    -- 20121005

    -- http://www.sqlservercentral.com/Forums/Topic1368367-391-1.aspx?Update=1

    -- A quick query puzzle.

    -- This solution is a rather cumbersome solution only using a large number of 'simple' replaces.

    -- It is also limited to the length defined in @maxlength (this is the maximum length a number can be).

    -- Create the table

    create table #t1(name varchar (100))

    GO

    INsert into #t1

    select '1234ABC123456XYZ1234567890ADS'

    GO

    INsert into #t1

    select 'cbv736456XYZ543534534545XLS'

    GO

    -- Create a replace stored procedure.

    Create Procedure st_replace

    @string1 varchar(8000),

    @string2 varchar(8000)

    as

    begin

    Update #t1 set name = replace(name,@string1,@string2)

    end

    go

    declare @maxlength int = 40

    declare @tel int

    -- Mark all the numeric characters.

    exec st_replace '0', '^0'

    exec st_replace '1', '^1'

    exec st_replace '2', '^2'

    exec st_replace '3', '^3'

    exec st_replace '4', '^4'

    exec st_replace '5', '^5'

    exec st_replace '6', '^6'

    exec st_replace '7', '^7'

    exec st_replace '8', '^8'

    exec st_replace '9', '^9'

    -- Bring the marks in front of the numbers

    set @tel = @maxlength

    WHILE @tel > 0 BEGIN

    exec st_replace '0^','^0'

    exec st_replace '1^','^1'

    exec st_replace '2^','^2'

    exec st_replace '3^','^3'

    exec st_replace '4^','^4'

    exec st_replace '5^','^5'

    exec st_replace '6^','^6'

    exec st_replace '7^','^7'

    exec st_replace '8^','^8'

    exec st_replace '9^','^9'

    set @tel = @tel - 1

    END

    -- Change the marks which are at least 6 long

    exec st_replace '^^^^^^','X^^^^^'

    set @tel = @maxlength

    WHILE @tel > 0 BEGIN

    exec st_replace 'x^','xx'

    set @tel = @tel - 1

    END

    --

    -- Numbers are now of the form ^^^^1234 or XXXXXXXX12345678

    --

    -- Remove the numbers behind the X mark.

    set @tel = @maxlength

    WHILE @tel > 0 BEGIN

    exec st_replace 'X0','X'

    exec st_replace 'X1','X'

    exec st_replace 'X2','X'

    exec st_replace 'X3','X'

    exec st_replace 'X4','X'

    exec st_replace 'X5','X'

    exec st_replace 'X6','X'

    exec st_replace 'X7','X'

    exec st_replace 'X8','X'

    exec st_replace 'X9','X'

    set @tel = @tel - 1

    END

    -- Remove the 'temporary' marks

    exec st_replace '^',''

    SELECT * FROM #T1

    DROP PROCEDURE ST_REPLACE

    DROP TABLE #T1

  • Hi

    Here's another variation that should do the trick. No function this time, just a query

    create table #t1(name varchar (100))

    GO

    INsert into #t1

    select '1234ABC123456XYZ1234567890ADS'

    GO

    INsert into #t1

    select 'cbv736456XYZ543534534545XLS'

    GO

    INsert into #t1

    select 'cbv736456XYZ543534534545XLS2134488'

    GO

    ;with cte AS (

    SELECT

    stuff(name ,

    patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',name) , --startposition

    patindex('%[^0-9]%', substring(name + '@',patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',name),99)) - 1 , --length (extra char added to ensure length returned)

    replicate('x',patindex('%[^0-9]%', substring(name + '@',patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',name),99)) - 1) --replacement x's

    ) res,

    1 mycount,

    row_number() over (order by name) grouper

    FROM #t1

    UNION ALL

    SELECT

    stuff(res ,

    patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',res) , --startposition

    patindex('%[^0-9]%', substring(res + '@',patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',res),99)) - 1 , --length

    replicate('x',patindex('%[^0-9]%', substring(res + '@',patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',res),99)) - 1) --replacement x's

    ) res,

    mycount + 1,

    grouper

    FROM cte

    WHERE patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',res) > 0

    ),

    maxcte as (select grouper, max(mycount) lastres from cte group by grouper)

    SELECT res

    FROM cte c

    inner join maxcte m on mycount = lastres and c.grouper = m.grouper

  • A small change See in the code.

    (My reason for this change change is: The essential part of the code is only written once, so it is smaller, clearer and with less riscs for errors).

    Thanks for submitting this code, I am learning from this.

    Ben Brugman

    mickyT (10/8/2012)


    Hi

    Here's another variation that should do the trick. No function this time, just a query

    create table #t1(name varchar (100))

    GO

    INsert into #t1

    select '1234ABC123456XYZ1234567890ADS'

    GO

    INsert into #t1

    select 'cbv736456XYZ543534534545XLS'

    GO

    INsert into #t1

    select 'cbv736456XYZ543534534545XLS2134488'

    GO

    ;with cte AS (

    SELECT

    --stuff(name ,

    --patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',name) , --startposition

    --patindex('%[^0-9]%', substring(name + '@',patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',name),99)) - 1 , --length (extra char added to ensure length returned)

    --replicate('x',patindex('%[^0-9]%', substring(name + '@',patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',name),99)) - 1) --replacement x's

    --) res,

    convert(varchar(max), res) as res,

    1 mycount,

    row_number() over (order by name) grouper

    FROM #t1

    UNION ALL

    SELECT

    stuff(res ,

    patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',res) , --startposition

    patindex('%[^0-9]%', substring(res + '@',patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',res),99)) - 1 , --length

    replicate('x',patindex('%[^0-9]%', substring(res + '@',patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',res),99)) - 1) --replacement x's

    ) res,

    mycount + 1,

    grouper

    FROM cte

    WHERE patindex('%[0-9][0-9][0-9][0-9][0-9][0-9]%',res) > 0

    ),

    maxcte as (select grouper, max(mycount) lastres from cte group by grouper)

    SELECT res

    FROM cte c

    inner join maxcte m on mycount = lastres and c.grouper = m.grouper

  • ben.brugman (10/8/2012)


    A small change See in the code.

    (My reason for this change change is: The essential part of the code is only written once, so it is smaller, clearer and with less riscs for errors).

    Thanks for submitting this code, I am learning from this.

    Ben Brugman

    I agree, that change makes it nicer. Cheers:-)

    ditto on the learning ... and I have enjoyed thinking about this problem 😎

    Normally I would have used a set of CLR functions that I have that replicate the oracle regular expression functions.

  • Just for fun, here's a nasty piece of work:

    create table #t1(name varchar (100))

    INsert into #t1

    select '1234ABC123456XYZ1234567890ADS'

    INsert into #t1

    select 'cbv736456XYZ543534534545XLS'

    ;WITH ChunkIt (RowID, n, str1, str2, str3) AS (

    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)), n=1

    ,CASE WHEN dig < alp THEN SUBSTRING(name, 1, alp-1) ELSE SUBSTRING(name, 1, dig-1) END

    ,CASE WHEN dig < alp THEN SUBSTRING(name, alp, LEN(name)) ELSE SUBSTRING(name, dig, LEN(name)) END

    ,CAST('' AS VARCHAR(100))

    FROM #t1

    CROSS APPLY (SELECT PATINDEX('%[0-9]%', name), PATINDEX('%[A-Za-z]%', name)) a(dig, alp)

    UNION ALL

    SELECT RowID, n+1

    ,b.str1

    ,b.str2

    ,CASE WHEN PATINDEX('%[0-9]%', b.str2) = 0 OR PATINDEX('%[A-Za-z]%', b.str2) = 0 THEN b.str2 ELSE '' END

    FROM ChunkIt

    CROSS APPLY (SELECT PATINDEX('%[0-9]%', str2), PATINDEX('%[A-Za-z]%', str2)) a(dig, alp)

    CROSS APPLY (

    SELECT CASE WHEN a.dig < a.alp THEN SUBSTRING(str2, 1, a.alp-1) ELSE SUBSTRING(str2, 1, a.dig-1) END

    ,CASE WHEN a.dig < a.alp THEN SUBSTRING(str2, a.alp, LEN(str2)) ELSE SUBSTRING(str2, a.dig, LEN(str2)) END

    ) b(str1, str2)

    WHERE a.dig > 0 AND a.alp > 0

    ),

    ForGrouping AS (

    SELECT RowID, n

    ,str1=CASE WHEN LEN(str1) >= 6 AND PATINDEX('%[0-9]%', str1) > 0 THEN REPLICATE('x', LEN(str1)) ELSE str1 END

    ,str2=CASE WHEN LEN(str3) >= 6 AND PATINDEX('%[0-9]%', str3) > 0 THEN REPLICATE('x', LEN(str3)) ELSE str3 END

    FROM ChunkIt

    )

    SELECT name=(

    SELECT str1 + str2

    FROM ForGrouping b

    WHERE a.RowID = b.RowID

    FOR XML PATH(''))

    FROM ForGrouping a

    GROUP BY RowID

    ORDER BY RowID

    DROP TABLE #t1


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 15 posts - 1 through 15 (of 44 total)

You must be logged in to reply to this topic. Login to reply