How to find the first recurring character of a string.

  • TheCTEGuy

    SSC Eights!

    Points: 892

    I have a query that finds the first recurring character in a string.
    For eg : if @STR = 'ABCCDA' then i should C as answer as it recurred first and not A(it also recurred but C recurred before it). If there are no recurring charcters then it should return either NULL or none.

    I am currently using tally table (in other words a number table) and inner join.  
    Is there a bettre way to do the same ?

    DECLARE @STR VARCHAR(50) = 'Adbcdefghijki'

    ;WITH CTE AS(
    SELECT SUBSTRING(@str,n,1) CHR,ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS RN FROM dbo.tally
    WHERE LEN(@str) >= N
    )
    SELECT TOP 1 CASE WHEN A.CHR IS NOT NULL THEN A.CHR ELSE 'NONE' END AS FIRST_RECURRING_CHARACTER FROM CTE A
    INNER JOIN CTE B
    ON A.CHR = B.CHR
    WHERE A.RN > B.RN
    ORDER BY A.RN
    GO

    First solve the problem then write the code !

  • Kenny Jozi

    SSCrazy

    Points: 2004

    Try this , tweak the script to suit your needs.

    --declare variable, type: table
    DECLARE @MyTable TABLE (Input NVARCHAR(30))

    --insert sample values
    INSERT INTO @MyTable (Input)
    VALUES ('abccda')

    --here CTE begins:
    ;WITH CTE AS
    (
      --initial query
      SELECT Input, CONVERT(VARCHAR(1),LEFT(Input,1)) AS Letter, RIGHT(Input, LEN(Input)-1) AS Remainder
      FROM @MyTable
      WHERE LEN(Input)>1
      --recursive part
      UNION ALL
      --recursive query
      SELECT Input, CONVERT(VARCHAR(1),LEFT(Remainder,1)) AS Letter,
       RIGHT(Remainder, LEN(Remainder)-1) AS Remainder
      FROM CTE
      WHERE LEN(Remainder)>0
    )
    SELECT Input, Letter, COUNT(Letter) AS CountOfLetter
    FROM CTE
    GROUP BY Input, Letter
    HAVING COUNT(Letter)>1
    ORDER by 2

    SQL 2000/2005/2008/2012 DBA - MCTS/MCITP

  • TheCTEGuy

    SSC Eights!

    Points: 892

    Kenny Jozi - Monday, October 30, 2017 5:00 AM

    Try this , tweak the script to suit your needs.

    --declare variable, type: table
    DECLARE @MyTable TABLE (Input NVARCHAR(30))

    --insert sample values
    INSERT INTO @MyTable (Input)
    VALUES ('abccda')

    --here CTE begins:
    ;WITH CTE AS
    (
      --initial query
      SELECT Input, CONVERT(VARCHAR(1),LEFT(Input,1)) AS Letter, RIGHT(Input, LEN(Input)-1) AS Remainder
      FROM @MyTable
      WHERE LEN(Input)>1
      --recursive part
      UNION ALL
      --recursive query
      SELECT Input, CONVERT(VARCHAR(1),LEFT(Remainder,1)) AS Letter,
       RIGHT(Remainder, LEN(Remainder)-1) AS Remainder
      FROM CTE
      WHERE LEN(Remainder)>0
    )
    SELECT Input, Letter, COUNT(Letter) AS CountOfLetter
    FROM CTE
    GROUP BY Input, Letter
    HAVING COUNT(Letter)>1
    ORDER by 2

    Your query is also giving me 2 scan counts. And also its using RECURSIVE CTE which are not a good( Jeff once told me) when we talk about performance.
    Any other way ?

    First solve the problem then write the code !

  • John Mitchell-245523

    SSC Guru

    Points: 148761

    This should perform better than either a self-join or a recursive CTE.  The first CTE isn't part of the solution - I just used it because I don't have a Tally table on my server.  Whichever solution you choose, make sure that if you turn it into a function, you make it an inline table-valued function and not a scalar-valued function.

    DECLARE @STR VARCHAR(50) = 'Adbcdefghijki';
    WITH Tally AS (
        SELECT ROW_NUMBER() OVER (ORDER BY object_id) AS n
        FROM sys.objects
        )
    , Characters AS (
        SELECT
             SUBSTRING(@str,n,1) AS CHR
        ,    n
        FROM Tally
        WHERE n <= LEN(@str)
        )
    , Repeats AS (
        SELECT
             CHR
        ,    ROW_NUMBER() OVER (PARTITION BY CHR ORDER BY n) AS RowNo
        ,    n AS CharacterSeq
        FROM Characters
        )
    , Recurrences AS (
        SELECT
             CHR
        ,    RowNo
        ,    CharacterSeq
        ,    ROW_NUMBER() OVER (ORDER BY CharacterSeq) AS RecurSeq
        FROM Repeats
        WHERE RowNo = 2 -- first recurrence
        )
    SELECT
         CHR
    ,    CharacterSeq
    FROM Recurrences
    WHERE RecurSeq = 1;

    John

  • TheCTEGuy

    SSC Eights!

    Points: 892

    John Mitchell-245523 - Monday, October 30, 2017 5:22 AM

    This should perform better than either a self-join or a recursive CTE.  The first CTE isn't part of the solution - I just used it because I don't have a Tally table on my server.  Whichever solution you choose, make sure that if you turn it into a function, you make it an inline table-valued function and not a scalar-valued function.

    DECLARE @STR VARCHAR(50) = 'Adbcdefghijki';
    WITH Tally AS (
        SELECT ROW_NUMBER() OVER (ORDER BY object_id) AS n
        FROM sys.objects
        )
    , Characters AS (
        SELECT
             SUBSTRING(@str,n,1) AS CHR
        ,    n
        FROM Tally
        WHERE n <= LEN(@str)
        )
    , Repeats AS (
        SELECT
             CHR
        ,    ROW_NUMBER() OVER (PARTITION BY CHR ORDER BY n) AS RowNo
        ,    n AS CharacterSeq
        FROM Characters
        )
    , Recurrences AS (
        SELECT
             CHR
        ,    RowNo
        ,    CharacterSeq
        ,    ROW_NUMBER() OVER (ORDER BY CharacterSeq) AS RecurSeq
        FROM Repeats
        WHERE RowNo = 2 -- first recurrence
        )
    SELECT
         CHR
    ,    CharacterSeq
    FROM Recurrences
    WHERE RecurSeq = 1;

    John

    Thats good @john-2 - Actually I dont know why I was after RANK() func .... anyways good one!

    First solve the problem then write the code !

  • ChrisM@Work

    SSC Guru

    Points: 186107

    Here's a compact alternative:

    DECLARE @STR VARCHAR(100) = 'AdbicdA'

    SELECT TOP(1) [CHR] = SUBSTRING(@STR,n,1)

    FROM dbo.Tally t

    WHERE LEN(@STR) >= n

    ORDER BY ISNULL(NULLIF(CHARINDEX(SUBSTRING(@STR,n,1),@STR,n+1),0),99999)

    [font="Arial"]“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw[/font]


    For fast, accurate and documented assistance in answering your questions, please read this article[/url].
    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]
    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
    [url

  • TheCTEGuy

    SSC Eights!

    Points: 892

    ChrisM@Work - Monday, October 30, 2017 6:30 AM

    Here's a compact alternative:

    DECLARE @STR VARCHAR(100) = 'AdbicdA'

    SELECT TOP(1) [CHR] = SUBSTRING(@STR,n,1)

    FROM dbo.Tally t

    WHERE LEN(@STR) >= n

    ORDER BY ISNULL(NULLIF(CHARINDEX(SUBSTRING(@STR,n,1),@STR,n+1),0),99999)

    @chris-2 the code you posted seems to be giving wrong result

    First solve the problem then write the code !

  • ChrisM@Work

    SSC Guru

    Points: 186107

    TheCTEGuy - Monday, October 30, 2017 6:35 AM

    ChrisM@Work - Monday, October 30, 2017 6:30 AM

    Here's a compact alternative:

    DECLARE @STR VARCHAR(100) = 'AdbicdA'

    SELECT TOP(1) [CHR] = SUBSTRING(@STR,n,1)

    FROM dbo.Tally t

    WHERE LEN(@STR) >= n

    ORDER BY ISNULL(NULLIF(CHARINDEX(SUBSTRING(@STR,n,1),@STR,n+1),0),99999)

    @chris-2 the code you posted seems to be giving wrong result

    Thanks - not enough testing. Here's a fix:

    SELECT TOP(1) [FIRST_RECURRING_CHARACTER] = SUBSTRING(@STR,n,1)

    FROM dbo.Tally t

    CROSS APPLY (SELECT NextPos = CHARINDEX(SUBSTRING(@STR,n,1),@STR,n+1)) x

    WHERE LEN(@STR) >= n AND x.NextPos > 0

    ORDER BY x.NextPos

    [font="Arial"]“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw[/font]


    For fast, accurate and documented assistance in answering your questions, please read this article[/url].
    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]
    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
    [url

  • John Mitchell-245523

    SSC Guru

    Points: 148761

    Chris

    Your code is certainly more compact, but I couldn't find any way to make it faster than mine.  I don't know whether that's to do with all the string manipulations that yours does, or the way I extrapolated it all out in order to return the first duplicated character in the sys.all_columns view of a largeish database.  Anyway, here are the results:
    -- John

    WITH Tally AS (
      SELECT ROW_NUMBER() OVER (ORDER BY object_id) AS n
      FROM sys.objects
      )
    , Characters AS (
      SELECT
            c.object_id
        ,   c.column_id
        ,   c.name
        ,   SUBSTRING(c.name,n,1) AS CHR
        ,   t.n
      FROM Tally t
      JOIN sys.all_columns c
      ON t.n <= LEN(c.name)
      ) 
    , Repeats AS (
      SELECT
            object_id
        ,   column_id
        ,   name
        ,   CHR
        ,   ROW_NUMBER() OVER (PARTITION BY object_id, column_id, CHR ORDER BY n) AS RowNo
        ,   n AS CharacterSeq
      FROM Characters
      ) 
    , Recurrences AS (
      SELECT
          name
      ,   CHR
      ,   CharacterSeq
      ,   ROW_NUMBER() OVER (PARTITION BY object_id, column_id ORDER BY CharacterSeq) AS RecurSeq
      FROM Repeats
      WHERE RowNo = 2 -- first recurrence
      )
    SELECT
        name
    ,   CHR
    ,   CharacterSeq
    FROM Recurrences
    WHERE RecurSeq = 1;

    --Table 'Worktable'. Scan count 8562, logical reads 17887, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    --Table 'sysschobjs'. Scan count 1, logical reads 46, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    --Table 'syscolpars'. Scan count 1, logical reads 404, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    --Table 'syscolpars'. Scan count 1, logical reads 47, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    --SQL Server Execution Times:
    -- CPU time = 391 ms, elapsed time = 872 ms.

    -- Chris

    WITH Tally AS (
      SELECT ROW_NUMBER() OVER (ORDER BY object_id) AS n
      FROM sys.objects
      )
    SELECT
         c.name
    ,    c.object_id
    ,    y.FIRST_RECURRING_CHARACTER
    FROM sys.all_columns c
    CROSS APPLY (
        SELECT TOP(1) [FIRST_RECURRING_CHARACTER] = SUBSTRING(c.name,n,1)
        FROM Tally t
        CROSS APPLY (SELECT NextPos = CHARINDEX(SUBSTRING(c.name,n,1),c.name,n+1)) x
        WHERE LEN(c.name) >= n AND x.NextPos > 0
        ORDER BY x.NextPos
        ) y;

    --Table 'Worktable'. Scan count 17121, logical reads 50128, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    --Table 'sysschobjs'. Scan count 3852, logical reads 142524, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    --Table 'syscolpars'. Scan count 1, logical reads 404, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    --Table 'syscolpars'. Scan count 1, logical reads 47, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    --SQL Server Execution Times:
    -- CPU time = 2516 ms, elapsed time = 3469 ms.

  • ChrisM@Work

    SSC Guru

    Points: 186107

    John, you're absolutely correct, your code runs about twice as fast as mine. It does, however, generate some strange results, and I've messed with it a bit so that could be my doing:

    IF object_id('tempdb..#ListOfStrings') IS NOT NULL DROP TABLE #ListOfStrings;
    SELECT *
    INTO #ListOfStrings
    FROM (VALUES
     ('Ad'),
     ('fyf Adbicdefghijkiqtfjoen gfviuyhtbnh2sghszxjhxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb fyf'),
     ('Adbicdefghijkiqtfjoen gfviuyhtbnhsghs3zxjhxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb'),
     ('fyf Adbicdefghijkiqtfjoen gfviuyhtbnhs4ghszxjhxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb fyf'),
     ('Adbicdefghijkiqtfjoen gfviuyhtbnhsghszx5jhxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb'),
     ('Adbicdefghijkiqtfjoen gfviuyhtbnhsghszxj6hxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb'),
     ('fyf Adbicdefghijkiqtfjoen gfviuyhtbnhsghs7zxjhxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb fyf'),
     ('Adbicdefghijkiqtfjoen gfviuyhtbnhsghszxjhx7cdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb'),
     ('fyf Adbicdefghijkiqtfjoen gfviuyhtbnhsghszx8jhxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb fyf'),
     ('Adbicdefghijkiqtfjoen gfviuyhtbnhsghszxjhxcd9kflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb'),
     ('Adbicdefghijkiqtfjoen gfviuyhtbnhsghs1zxjhxcdkflg,gmdcj1shja\zhgatwevwnfdm,gflgotitheb'),
     ('fyf Adbicdefghijkiqtfjoen gfviuyhtbnh2sghszxjhxcdkflg,gm2dcjshja\zhgatwevwnfdm,gflgotitheb fyf'),
     ('Adbicdefghijkiqtfjoen gfviuyhtbnhsghs3zxjhxcdkflg,gmdcjsh3ja\zhgatwevwnfdm,gflgotitheb'),
     ('fyf Adbicdefghijkiqtfjoen gfviuyhtbnhs4ghszxjhxcdkflg,gmdc4jshja\zhgatwevwnfdm,gflgotitheb fyf'),
     ('Adbicdefghijkiqtfjoen gfviuyhtbnhsghszx5jhxcdkflg,gmdcjshja5\zhgatwevwnfdm,gflgotitheb'),
     ('Adbicdefghijkiqtfjoen gfviuyhtbnhsghszxj6hxcdkflg,gmdcjshja\6zhgatwevwnfdm,gflgotitheb'),
     ('fyf Adbicdefghijkiqtfjoen gfviuyhtbnhsghs7zxjhxcdkflg,gmdcjsh7ja\zhgatwevwnfdm,gflgotitheb fyf'),
     ('Adbicdefghijkiqtfjoen gfviuyhtbnhsghszxjhx7cdkflg,gmdcjshja\zh8gatwevwnfdm,gflgotitheb'),
     ('fyf Adbicdefghijkiqtfjoen gfviuyhtbnhsghszx8jhxcdkflg,gmdcjshja9\zhgatwevwnfdm,gflgotitheb fyf'),
     ('Adbicdefghijkiqtfjoen gfviuyhtbnhsghszxjhxcd9kflg,gmdcjshja\zhga10twevwnfdm,gflgotitheb'),
     ('AdbicdA'),
     ('f1yf Adbicdefghijkiqtfjoen gfviuyhtbnh2sghszxjhxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb fyf'),
     ('Ad2bicdefghijkiqtfjoen gfviuyhtbnhsghs3zxjhxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb'),
     ('fyf3 Adbicdefghijkiqtfjoen gfviuyhtbnhs4ghszxjhxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb fyf'),
     ('Adbi4cdefghijkiqtfjoen gfviuyhtbnhsghszx5jhxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb'),
     ('Adbic5defghijkiqtfjoen gfviuyhtbnhsghszxj6hxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb'),
     ('fyf Ad6bicdefghijkiqtfjoen gfviuyhtbnhsghs7zxjhxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb fyf'),
     ('Adbicde7fghijkiqtfjoen gfviuyhtbnhsghszxjhx7cdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb'),
     ('fyf Adbi8cdefghijkiqtfjoen gfviuyhtbnhsghszx8jhxcdkflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb fyf'),
     ('Adbicdefg9hijkiqtfjoen gfviuyhtbnhsghszxjhxcd9kflg,gmdcjshja\zhgatwevwnfdm,gflgotitheb'),
     ('Adbicdefgh1ijkiqtfjoen gfviuyhtbnhsghs1zxjhxcdkflg,gmdcj1shja\zhgatwevwnfdm,gflgotitheb'),
     ('fyf Adbicde2fghijkiqtfjoen gfviuyhtbnh2sghszxjhxcdkflg,gm2dcjshja\zhgatwevwnfdm,gflgotitheb fyf'),
     ('Adbicdefghij3kiqtfjoen gfviuyhtbnhsghs3zxjhxcdkflg,gmdcjsh3ja\zhgatwevwnfdm,gflgotitheb'),
     ('fyf Adbicdefg4hijkiqtfjoen gfviuyhtbnhs4ghszxjhxcdkflg,gmdc4jshja\zhgatwevwnfdm,gflgotitheb fyf'),
     ('Adbicdefghijki5qtfjoen gfviuyhtbnhsghszx5jhxcdkflg,gmdcjshja5\zhgatwevwnfdm,gflgotitheb'),
     ('Adbicdefghijkiq6tfjoen gfviuyhtbnhsghszxj6hxcdkflg,gmdcjshja\6zhgatwevwnfdm,gflgotitheb'),
     ('fyf Adbicdefghij7kiqtfjoen gfviuyhtbnhsghs7zxjhxcdkflg,gmdcjsh7ja\zhgatwevwnfdm,gflgotitheb fyf'),
     ('Adbicdefghijkiqtf8joen gfviuyhtbnhsghszxjhx7cdkflg,gmdcjshja\zh8gatwevwnfdm,gflgotitheb'),
     ('fyf Adbicdefghijki9qtfjoen gfviuyhtbnhsghszx8jhxcdkflg,gmdcjshja9\zhgatwevwnfdm,gflgotitheb fyf'),
     ('Adbicdefghijkiqtfjo0en gfviuyhtbnhsghszxjhxcd9kflg,gmdcjshja\zhga10twevwnfdm,gflgotitheb')

    ) d ([name]);
    PRINT '== John ========================================================================================================================================================';
    SET STATISTICS IO, TIME ON;
    WITH Tally AS (SELECT CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS INT) AS n FROM sys.objects),
    Characters AS (
      SELECT
           c.name
        ,   SUBSTRING(c.name,n,1) AS CHR
        ,   t.n
      FROM Tally t
      JOIN #ListOfStrings c
      ON t.n <= LEN(c.name)
      )
    , Repeats AS (
      SELECT
           name
        ,   CHR
        ,   ROW_NUMBER() OVER (PARTITION BY CHR ORDER BY n) AS RowNo
        ,   n AS CharacterSeq
      FROM Characters
      )
    , Recurrences AS (
      SELECT
          name
      ,   CHR
      ,   ROW_NUMBER() OVER (PARTITION BY CHR ORDER BY CharacterSeq) AS RecurSeq
      FROM Repeats
      WHERE RowNo = 2 -- first recurrence
      )
    SELECT
        name
    ,   CHR
    FROM Recurrences
    WHERE RecurSeq = 1
    ORDER BY NAME;
    SET STATISTICS IO, TIME OFF;
    PRINT '== Chris ========================================================================================================================================================';
    SET STATISTICS IO, TIME ON;
    WITH Tally AS (SELECT CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS INT) AS n FROM sys.objects)
    SELECT
         c.name
    ,    y.FIRST_RECURRING_CHARACTER
    FROM #ListOfStrings c
    CROSS APPLY (
        SELECT TOP(1) [FIRST_RECURRING_CHARACTER] = SUBSTRING(c.name,n,1)
        FROM Tally t
        CROSS APPLY (SELECT NextPos = CHARINDEX(SUBSTRING(c.name,n,1),c.name,n+1)) x
        WHERE LEN(c.name) >= n AND x.NextPos > 0
        ORDER BY x.NextPos
        ) y
    ORDER BY NAME;
    SET STATISTICS IO, TIME OFF;

    [font="Arial"]“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw[/font]


    For fast, accurate and documented assistance in answering your questions, please read this article[/url].
    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]
    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
    [url

  • John Mitchell-245523

    SSC Guru

    Points: 148761

    Chris

    Yes, you appear to have changed my PARTITION BY clauses.  The first one should be PARTITION BY name, CHR; the second should be PARTITION BY name.  Of course, this relies on names being unique, which they are in your data set.

    John

  • ChrisM@Work

    SSC Guru

    Points: 186107

    John Mitchell-245523 - Monday, October 30, 2017 9:50 AM

    Chris

    Yes, you appear to have changed my PARTITION BY clauses.  The first one should be PARTITION BY name, CHR; the second should be PARTITION BY name.  Of course, this relies on names being unique, which they are in your data set.

    John

    Found some time to get back onto it and instead of using the name as a uniquifier I'm putting in a RowID, it's slendererer fewer characters. Sorry about that.

    [font="Arial"]“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw[/font]


    For fast, accurate and documented assistance in answering your questions, please read this article[/url].
    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]
    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
    [url

  • ChrisM@Work

    SSC Guru

    Points: 186107

    Interesting - if you use a tally table derived from sys.objects then John's query is fastest. If you use a hard tally table then mine is fastest (though not by much):
    PRINT '== Chris ========================================================================================================================================================';
    SET STATISTICS IO, TIME ON;
    WITH Tally AS (SELECT CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS INT) AS n FROM sys.objects)
    SELECT
         c.name
    ,    y.FIRST_RECURRING_CHARACTER
    FROM #ListOfStrings c
    CROSS APPLY (
        SELECT TOP(1) [FIRST_RECURRING_CHARACTER] = SUBSTRING(c.name,n,1)
        FROM Tally t
        CROSS APPLY (SELECT NextPos = CHARINDEX(SUBSTRING(c.name,n,1),c.name,n+1)) x
        WHERE LEN(c.name) >= n AND x.NextPos > 0
        ORDER BY x.NextPos
        ) y
    ORDER BY NAME;
    SET STATISTICS IO, TIME OFF;

    PRINT '== John ========================================================================================================================================================';
    SET STATISTICS IO, TIME ON;
    WITH Tally AS (SELECT CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS INT) AS n FROM sys.objects),
    Characters AS (
      SELECT
           c.RowID
     ,   c.Name
        ,   SUBSTRING(c.name,n,1) AS CHR
        ,   t.n
      FROM Tally t
      JOIN #ListOfStrings c
      ON t.n <= LEN(c.name)
      )
    , Repeats AS (
      SELECT
           RowID
     ,   Name   
     ,   CHR
        ,   ROW_NUMBER() OVER (PARTITION BY RowID, CHR ORDER BY n) AS RowNo
        ,   n AS CharacterSeq
      FROM Characters
      )
    , Recurrences AS (
      SELECT
            Name   
      ,   CHR
      ,   ROW_NUMBER() OVER (PARTITION BY RowID ORDER BY CharacterSeq) AS RecurSeq
      FROM Repeats
      WHERE RowNo = 2 -- first recurrence
      )
    SELECT
        name
    ,   CHR
    FROM Recurrences
    WHERE RecurSeq = 1
    ORDER BY NAME;
    SET STATISTICS IO, TIME OFF;

     

    [font="Arial"]“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw[/font]


    For fast, accurate and documented assistance in answering your questions, please read this article[/url].
    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]
    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
    [url

  • John Mitchell-245523

    SSC Guru

    Points: 148761

    Goodness, yes - I tried it with an on-the-fly table.  Mine slowed down significantly and yours sped up a lot.  I can't really explain that.

    I changed the tally CTE to this:
    WITH Tally AS (
      SELECT ROW_NUMBER() OVER (ORDER BY x.m) AS n
      FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) x(m)
        CROSS JOIN (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) y(r)
      )

    and here are the times:
    Table 'Worktable'. Scan count 0, logical reads 217882, physical reads 0, read-ahead reads 187, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'syscolpars'. Scan count 1, logical reads 40400, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'syscolpars'. Scan count 100, logical reads 4700, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:
     CPU time = 1687 ms, elapsed time = 2331 ms.

    Table 'Worktable'. Scan count 17121, logical reads 50131, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'syscolpars'. Scan count 1, logical reads 404, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'syscolpars'. Scan count 1, logical reads 47, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:
     CPU time = 485 ms, elapsed time = 747 ms.

    John

  • ChrisM@Work

    SSC Guru

    Points: 186107

    John Mitchell-245523 - Monday, October 30, 2017 10:38 AM

    Goodness, yes - I tried it with an on-the-fly table.  Mine slowed down significantly and yours sped up a lot.  I can't really explain that.

    I changed the tally CTE to this:
    WITH Tally AS (
      SELECT ROW_NUMBER() OVER (ORDER BY x.m) AS n
      FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) x(m)
        CROSS JOIN (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) y(r)
      )

    and here are the times:
    Table 'Worktable'. Scan count 0, logical reads 217882, physical reads 0, read-ahead reads 187, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'syscolpars'. Scan count 1, logical reads 40400, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'syscolpars'. Scan count 100, logical reads 4700, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:
     CPU time = 1687 ms, elapsed time = 2331 ms.

    Table 'Worktable'. Scan count 17121, logical reads 50131, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'syscolpars'. Scan count 1, logical reads 404, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'syscolpars'. Scan count 1, logical reads 47, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:
     CPU time = 485 ms, elapsed time = 747 ms.

    John

    I guess the plan of one of the queries (or possibly both) is sensitive to small changes. Might be worth spending more time on this.

    [font="Arial"]“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw[/font]


    For fast, accurate and documented assistance in answering your questions, please read this article[/url].
    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]
    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
    [url

Viewing 15 posts - 1 through 15 (of 47 total)

You must be logged in to reply to this topic. Login to reply