Compare two varchar arrays one character at time until different

  • I need to compare two varchar character arrays one character at a time until they differ using T-SQL, can someone help?

    For example given the two varchar arrays:

    DECLARE @String1 VARCHAR(100)

    SET @String1 = 'Smith, Alex'

    DECLARE @String2 VARCHAR(100)

    SET @String2 = 'Smith, Alice'

    Compare them one character at a time starting from the left, until they are different.

    When a difference is detected, produce the following output

    String1 = 'Smith, Ale'

    String2 = 'Smith, Ali'

    Thanks!


    Regards,
    Bob Szymanski
    Blue Mule Software

  • Call off the Dogs!!! Never mind. I figured it out myself.

    DECLARE @String1 VARCHAR(100)

    SET @String1 = 'Smith, Alex'

    DECLARE @String2 VARCHAR(100)

    SET @String2 = 'Smith, Alice'

    -- Compare arrays one character at a time starting at the left until different.

    -- When different produce the following results:

    -- String1 = 'Smith, Ale'

    -- String2 = 'Smith, Ali'

    DECLARE @i INT

    SET @i = 1

    while @i <= LEN(@String1) begin

    IF (LEFT(@String1, @i) LEFT(@String2, @i))

    BEGIN

    SET @String1 = SUBSTRING(@String1, 1, @i + 1)

    SET @String2 = SUBSTRING(@String2, 1, @i + 1)

    BREAK

    END

    -- increment the row counter

    SET @i = @i + 1

    END

    PRINT @String1 + '-' + @String2


    Regards,
    Bob Szymanski
    Blue Mule Software

  • while @i <= LEN(@String1) begin

    This is executed every pass so set the length once and evaluate the length variable for performance.

    Set @len = Len(@string1)

    While @i <= @len etc....

    Small things add up!

  • Very nicely done, Robert... but if you need to do this for a whole table, you'll either need to make a function, or some form of double nested While loop... some folks would even resort to using a (yeech....haaaaackk!) Cursor (Patooooiiii!)

    ... or ...

    ... we can use the set-based magic of a "Tally" table to do a whole table at once.  Now, this assumes that you have some sort of Primary Key on the table and most tables should even if it's a "surrogate" key instead of a "natural" key.

    We need some data to test with, so let's create, oh, say, 10,000 rows of identical data with some really large names to check... look for the "8" near the end of string 2 which is the only difference between String1 and String2...

    --===== Create a table to hold the sample test data in to simulate a "real" problem
         -- and populate it with test data, on the fly...
     SELECT TOP 10000
            RowNum = IDENTITY(INT,1,1),
            String1 = 'Smith, Alexabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz',
            String2 = 'Smith, Alexabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrs8uvwxyz'
       INTO jbmCompareDemo
       FROM Master.dbo.SysObjects sc1,
            Master.dbo.SysObjects sc2
    --===== Add a Primary Key to the test table
      ALTER TABLE jbmCompareDemo
        ADD PRIMARY KEY CLUSTERED (RowNum)

    Heh... ya just gotta love the power of a nice restricted cross-join for making stuff in a hurry, huh?

    Speaking of cross-joins, let's make a nice little "Tally" or "Numbers" table... all it is is a very well indexed table of consecutive numbers from 1 to some number (I use 11,000).  It has lots and lots of interesting uses and is as fast or faster (usually) than a While loop, so you might want to make it a permanent part of your programming arsenal.  The real key is how easy things become when you use it.  Here's how to make one...

    --===== Create and populate the Tally table on the fly

     SELECT TOP 11000 --equates to more than 30 years of dates by day

            IDENTITY(INT,1,1) AS N

       INTO dbo.Tally

       FROM Master.dbo.SysColumns sc1,

            Master.dbo.SysColumns sc2

    --===== Add a Primary Key to maximize performance

      ALTER TABLE dbo.Tally

            ADD CONSTRAINT PK_Tally_N PRIMARY KEY CLUSTERED (N)

    --===== Allow the general public to use it

      GRANT SELECT ON dbo.Tally TO PUBLIC

    --===== Display the content of the Tally table

     SELECT * FROM dbo.Tally

    You with me so far?  We have a Tally table containing a bunch of sequential numbers AND we have a test table (jbmCompareDemo) containing 10,000 rows of predictable data to test with.  Hmmm... wonder what would happen if we cross-joined those tables (with restrictions) using substrings in the joins... would it replace a loop for the whole table?  You bet it would...

    --===== Finds first character position that doesn't match

         -- between the String1 and String2 columns of the test table

     SELECT RowNum, MIN(N) AS N

       FROM dbo.Tally WITH (NOLOCK),

            jbmCompareDemo

      WHERE N <= LEN(String1)

        AND N <= LEN(String2)

        AND SUBSTRING(String1,N,1) <> SUBSTRING(String2,N,1)

      GROUP BY RowNum

    So far, so good, eh?  But you wanted to show the left matching part of both strings up to and including the first character that doesn't match... let's use the above query to drive all of that as a "derived table" (one of my other SQL "best friends")...

    --===== Show the left matching part of both strings up to and

         -- including the first character that doesn't match

     SELECT LEFT(t.String1,d.N)+'-'+LEFT(t.String2,d.N)

       FROM jbmCompareDemo t,

            (--==== Derived table "d" finds first character position that doesn't match

                 -- between the String1 and String2 columns of the test table

             SELECT RowNum, MIN(N) AS N

               FROM dbo.Tally WITH (NOLOCK),

                    jbmCompareDemo

              WHERE N <= LEN(String1)

                AND N <= LEN(String2)

                AND SUBSTRING(String1,N,1) <> SUBSTRING(String2,N,1)

              GROUP BY RowNum

            ) d

      WHERE t.RowNum = d.RowNum

    Pretty cool, huh?  And look at how simple the code appears!  Study the code to figure out what's going on (cross-join between object table and Tally table using Substring at character level) and lemme know if you have any questions...

    p.s. SQL Server 2005 has a nifty feature (RowNum) that can be used to replace the Tally table... I don't have SQL Server 2005 to test on, so I won't demo the feature.  But I'll just bet someone else could...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Hi Jeff, Whoa - you go dude!! That's pretty intense.

    It is my intention to create a function for this in addition to some additional code.

    I prefer not to use cursors as I learned to avoid them if possible.

    The table tally example you show is interesting. I would be interested in seeing a version of my original question using the row_number() function you mention (I am using SQL Server 2005).

    Again, thanks for your help.


    Regards,
    Bob Szymanski
    Blue Mule Software

  • Ok, a function it is... still using the Tally table 'cause I don't have 2K5 to test the code with...

     CREATE FUNCTION dbo.fnCharCompare

    /******************************************************************************

     Purpose:

     Compare the two input strings and return the left matching part of both

     strings (separated by a single dash) up to and including the first character

     that doesn't match.

    ******************************************************************************/

    --===== Delare the input parameters

            (

            @pString1 CHAR(8000),

            @pString2 CHAR(8000)

            )

    RETURNS VARCHAR(8000)

         AS

    BEGIN

            --===== Return the left matching part of both strings up to and

                 -- including the first character that doesn't match

             RETURN (SELECT LEFT(@pString1,MIN(N))

                          + '-'

                          + LEFT(@pString2,MIN(N))

                       FROM dbo.Tally WITH (NOLOCK)

                      WHERE SUBSTRING(@pString1,N,1) <> SUBSTRING(@pString2,N,1))

    END

    Wonder if you might answer a question for me, Robert... what are you going to use this for?  Haven't seen this particular request (problem) before...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Ah crud... I've got a fly in the ointment... didn't test the majority of possibilities... I'll be back...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Sorry about that... the other code had a problem when the two strings were different lengths (doh!)... I fixed the function code above...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Hi Jeff, thanks for your interest. Ok, you asked for it so here goes...

    Some background info first. We have an ASP.NET Web Application that serves up a list of names in a dynamic tree control. The nodes of the tree are generated on demand and loaded into the tree dynamically. When this list of names become very big (and indeed it will be), the delay takes too long. So to minimize the delay we will group the names into dynamically created folders. Each folder will contain n number of names based on some passed in limit (In my code below this is the variable @NthRow). This way we can mimimize the number of nodes to fetch yet still support a large number of names.

    The final output of our function must be a table of strings in the form of X - Y where X is the starting name and Y is the ending name of the group.

    For example, given the following names and @NthRow = 5

    Baker, Ms. Jamie

    Espinosa, Ms. Jean

    Gonzales, Mr. Robert

    Holland, Ms. Julia

    Jimenez, Ms. Soni

    Macdonald, Mr. Mickey

    Miller, Ms. Jana

    Noriega, Ms. Michelle

    Owen, Ms. Alma

    Ramirez, Ms. Stephanie

    Salas, Ms. Jeanine

    Sipes, Mr. Tyler

    Tamayo, Ms. Laura

    Timmer, Ms. Julie

    Trevino, Ms. Yvonne

    Young, Ms. Annia

    Produce an output table of:

    Ba-Ji

    Ma-Ra

    Sa-Tr

    Y-Y

    Where:

    Ba-Ji represents the names Baker, Ms. Jamie thru Jimenez, Ms. Soni

    Ma-Ra represents the names Macdonald, Mr. Mickey thru Ramirez, Ms. Stephanie

    Sa-Tr represents the names Salas, Ms. Jeanine thru Trevino, Ms. Yvonne

    Y-Y represents the name Young, Ms. Annia

    Got it? Now for the code that I have so far, which works mind you, but may not be very efficient and/or elegant as I'm a T-SQL newbie. As you can probably tell from my style, I am a developer. This style may or may not lead to very efficient code in T-SQL. Any hoot, please feel free to comment on how I can better follow DBA "best practices" where you see fit.

    The code itself is not in a function yet as I wanted to nail down the agloritm first. So I was just working of a script file. below is the content of what I have thus far.

    I bet you're sorry you asked.

    DECLARE @RowId INT

    DECLARE @maxRowId INT

    DECLARE @StartName VARCHAR(8000)

    DECLARE @EndName VARCHAR(8000)

    DECLARE @LastFolderNodeStartNameID INT

    DECLARE @i INT

    DECLARE @len INT

    DECLARE @NodeText VARCHAR(8000)

    DECLARE @FullNodeTextCount INT

    DECLARE @NthRow INT

    -- Temp table variable to store raw query results with an ID

    DECLARE @raw TABLE

    (

    ID int IDENTITY(1,1),

    RawText VARCHAR(8000)

    )

    -- Temp table variable to store long versions of both the starting text and the ending text

    DECLARE @FullNodeText TABLE

    (

    ID int IDENTITY(1,1),

    StartText VARCHAR(8000),

    EndText VARCHAR(8000)

    )

    -- Temp table variable to store results

    DECLARE @Results TABLE

    (

    ID INT,

    NodeText VARCHAR(8000)

    )

    -- Initialize tables and variables

    SET @NthRow = 100 -- This controls how many nodes will be represented by each folder node. Later this will be passed in as an argument to the function

    -- Populate the Raw table

    INSERT INTO @raw (RawText)

    SELECT RTRIM(LastName) + ', ' + RTRIM(Title) + ' ' + RTRIM(FirstName) + ' ' + RTRIM(MiddleName) + ' ' + RTRIM(Suffix) AS RawText

    FROM KeyHolders

    WHERE KeyHolderID > 0

    ORDER BY RawText

    -- Populate the FullNodeText table with with every nth and nth + 1 raw text

    INSERT INTO @FullNodeText(StartText,EndText)

    SELECT R1.RawText, R2.RawText

    FROM @raw R1

    INNER JOIN @raw R2 ON R1.ID + @NthRow - 1 = R2.ID

    WHERE R1.ID % @NthRow-1 = 0

    -- How many text nodes were found?

    SELECT @MaxRowId = MAX(ID) FROM @raw

    -- Were there any remainders?

    IF @MaxRowId ((SELECT COUNT(*) FROM @FullNodeText) * @NthRow )

    BEGIN

    -- In the case we have a remainder, we still need to add the last folder node start and end names to our final results.

    -- This occurs when there are remaining node(s) after extracting the nth and nth + 1 row (previously).

    -- To do this simply calculate the IDs and insert them into the FullNodeText table.

    SET @LastFolderNodeStartNameID = (SELECT COUNT(*) FROM @FullNodeText) * @NthRow + 1

    SET @StartName = (SELECT RawText FROM @raw WHERE id = @LastFolderNodeStartNameID)

    SET @EndName = (SELECT RawText FROM @raw WHERE id = @MaxRowId)

    INSERT INTO @FullNodeText(StartText,EndText) VALUES(@StartName, @EndName)

    END

    -- Determine the start and end row IDs (@FullNodeText.ID is an Identity column, auto-incrementing starting at 1)

    SELECT @RowId = MIN(ID), @MaxRowId = MAX(ID) FROM @FullNodeText

    SET @StartName = ''

    SET @EndName = ''

    -- Iterate through the FullNodeText table one row at a time.

    -- For each row, compare the Start text with the end text one character at a time starting

    -- from the left until a difference is detected. When the difference is found insert the

    -- strings into another temp table which is our final function results

    WHILE @RowId <= @MaxRowId BEGIN

    DECLARE @StartText VARCHAR(8000)

    DECLARE @EndText VARCHAR(8000)

    -- Fetch the values from the @FullNodeText table one row at a time

    SELECT @StartName = StartText, @EndName = EndText

    FROM @FullNodeText

    WHERE ID = @RowId

    -- It is possible to have a single remaining row in which case the

    -- start name would

    IF @StartName = @EndName

    BEGIN

    SET @StartText = SUBSTRING(@StartName, 1, 1)

    SET @EndText = SUBSTRING(@EndName, 1, 1)

    INSERT INTO @Results (ID, NodeText) VALUES(-1, @StartText + '-' + @EndText)

    END

    ELSE

    BEGIN

    -- compare on character at a time starting at the left until different.

    SET @i = 1

    SET @len = Len(@EndName)

    WHILE @i <= @Len BEGIN

    IF (UPPER(LEFT(@StartName, @i))) UPPER((LEFT(@EndName, @i)))

    BEGIN

    SET @StartText = SUBSTRING(@StartName, 1, @i + 1)

    SET @EndText = SUBSTRING(@EndName, 1, @i + 1)

    SET @NodeText = @StartText + '-' + @EndText

    INSERT INTO @Results (ID, NodeText) VALUES(-1, @NodeText)

    BREAK

    END

    SET @i = @i + 1

    END

    END

    PRINT @NodeText

    -- increment loop counter

    SET @RowId = @RowId + 1

    END

    SELECT * FROM @Results


    Regards,
    Bob Szymanski
    Blue Mule Software

  • There are a number of ways to returns trees in a high speed fashion in SQL Server 2000.... are you simply having performance problems or are you just concerned with the number of items in the tree.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Hi Jeff, I'm aware of many tree schema/strategies. ASP.NET 2.0 and SQL Server 2005 have some very interesting approaches to implementing trees and we have research these in detail. Due to the nature of our current project, we were unable to employ such strategies. Regardless, in my opinion, either would still have issues when presenting in excess of 10,000 dynamically generated nodes. With that many nodes it turns into more of a client side issue rather than server. Besides, changing the strategy at this point is not an option.


    Regards,
    Bob Szymanski
    Blue Mule Software

  • Here is a nice function that works for up to 8 characters!

    CREATE

    FUNCTION dbo.fnFirstDifference

    (

    @FirstString VARCHAR(8),

    @SecondString VARCHAR(8)

    )

    RETURNS VARCHAR(17)

    AS

    BEGIN

    DECLARE @var VARBINARY(8),

    @pos TINYINT

    SELECT @var = CAST(CAST(@FirstString AS BINARY(8)) AS BIGINT) ^ CAST(CAST(@SecondString AS BINARY(8)) AS BIGINT),

    @pos = PATINDEX('%[^0]%', SUBSTRING(master.dbo.fn_sqlvarbasetostr(@var), 3, 16)),

    @pos = (1 + @pos) / 2

    RETURN LEFT(@FirstString, @pos) + '-' + LEFT(@SecondString, @pos)

    END

     


    N 56°04'39.16"
    E 12°55'05.25"

  • So you can't limit the query by level and tree branch on a single click?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Bob,

    For my own purposes I have used your problem as the basis for a blog post. I have modified your original problem to pose a general question concerning the creation of intervals for character data. The solution is in the D4 relational language of Dataphor using sql server as the data repository. It is essentially a non-procedural solution so as to possibly interest t-sql users If anyone is interested (and you should be) here is the link:

     http://beyondsql.blogspot.com/2007/07/dataphor-create-string-intervals.html

    Thanks for the overall problem and if there are questions feel free to comment on the post. I hasten to add that I could have offered a solution using the RAC utility but I refrained. Should anyone (Jeff) like to see the solution in RAC please post

    best,

    http://www.beyondsql.blogspot.com

    http://www.rac4sql.net

     

     

  • Nope... I'm pretty well spammed out on RAC.  How about posting the T-SQL solution that it's supposed to replace?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply