Performance difference between LIKE and CHARINDEX?

  • I have a T-SQL script that looks through the firstname column and modifies any firstname with and '&' anywhere in the value. It then blanks out the part of the name from the '&'.

    I know that LIKE '%&%' results in a table scan. I also know that I can check for a value of > 0 for CHARINDEX but suspect it will also perform a table scan.

    My question is whether there is some outside chance that one of the methods would be more efficient than the other.

    Thanks!

  • The way you are using it, looking for that one character, it probably won't make a measurable difference.

    Unless you need to know the index of the character/string I would use LIKE since it isn't a function. I try to avoid functions in a where clause for obvious reasons.

    The probability of survival is inversely proportional to the angle of arrival.

  • Thanks! That is pretty much what I was thinking. However, I do get surprised once in a while.

  • Like will perform more optimally when used like a LEFT(). Otherwise they're both going to scan. LIKE I believe has a better chance of using Full Text Indexing but I'd have to review the specifics again. I don't use charindex except for string delimiter breaks these days.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • I was curious so I set up a test table with 100,000 rows of random names with some double-names (i.e., with an ampersand) randomly thrown in.

    I tested selecting the rows with ampersands via LIKE and CHARINDEX in the where clause and then doing the actual update. Then, out of curiosity, I also tried using DelimitedSplit8K to split the target column with a CROSS APPLY to get the rows to update.

    I just did a very simple test using client statistics and I know that isn't the most accurate way to measure performance. Based on that, the differences between LIKE and CHARINDEX are negligible with the split string and join method following close behind. However, I'd bet that if the operation required finding more than a single character the split and join would come out ahead.

    FWIW, here's the code I used for testing.

    --Code to generate some random names came from this post

    --http://www.sqlservercentral.com/Forums/FindPost428949.aspx

    IF OBJECT_ID('tempdb..#randomFullname') IS NOT NULL

    DROP TABLE #randomFullname

    IF OBJECT_ID('tempdb..#firstname') IS NOT NULL

    DROP TABLE #firstname

    IF OBJECT_ID('tempdb..#lastName') IS NOT NULL

    DROP TABLE #lastName

    IF OBJECT_ID('tempdb..#NamesToUpdate') IS NOT NULL

    DROP TABLE #NamesToUpdate

    CREATE TABLE #randomFullname (

    rid INT IDENTITY(1,1) NOT NULL,

    fnid INT NOT NULL,

    lnid INT NOT NULL,

    PRIMARY KEY (rid))

    CREATE TABLE #firstname (

    fnID INT IDENTITY(1,1) NOT NULL,

    FName varchar(100) NULL,

    PRIMARY KEY (fnID))

    CREATE TABLE #lastName (

    lnID INT IDENTITY(1,1) NOT NULL,

    LName varchar(100) NULL,

    PRIMARY KEY (lnID))

    CREATE TABLE #NamesToUpdate (

    rid INT NOT NULL,

    LName varchar(100) NOT NULL,

    FName varchar(100) NOT NULL,

    PRIMARY KEY (rid))

    INSERT #firstname (FName)

    SELECT 'Matt' UNION ALL

    SELECT 'Jeff' UNION ALL

    SELECT 'Jason' UNION ALL

    SELECT 'Mark' UNION ALL

    SELECT 'Andrew' UNION ALL

    SELECT 'Bob' UNION ALL

    SELECT 'Joe' UNION ALL

    SELECT 'Steve' UNION ALL

    SELECT 'Kelly' UNION ALL

    SELECT 'Beverly' UNION ALL

    SELECT 'Janet' UNION ALL

    SELECT 'Angela' UNION ALL

    SELECT 'Brandie' UNION ALL

    SELECT 'Josephine'

    INSERT #lastName (LName)

    SELECT 'Miller' UNION ALL

    SELECT 'Moden' UNION ALL

    SELECT 'Smith' UNION ALL

    SELECT 'Jones' UNION ALL

    SELECT 'Tarvin' UNION ALL

    SELECT 'ODoul' UNION ALL

    SELECT 'Dupont' UNION ALL

    SELECT 'Kelley' UNION ALL

    SELECT 'Taylor' UNION ALL

    SELECT 'Barton'

    DECLARE

    @fncount int,

    @lncount int

    SELECT @fncount = count(*) FROM #firstname

    SELECT @lncount = count(*) FROM #lastName

    INSERT INTO #randomFullname (fnid,lnid)

    SELECT TOP 100000

    cast(rand(checksum(newid()))*(@fncount-1) as int)+1,

    cast(rand(checksum(newid()))*(@lncount-1) as int)+1

    FROM Master.dbo.SysColumns t1,

    Master.dbo.SysColumns t2

    INSERT INTO #NamesToUpdate (rid,LName,FName)

    SELECT

    rid

    ,LName

    ,ISNULL((CASE

    WHEN fn.fnid%(cast(rand(checksum(newid()))*(@fncount+1) as int)+1) > 0

    THEN (SELECT TOP(1) FName FROM #firstname WHERE fnID = fn.fnid%(cast(rand(checksum(newid()))*(@fncount+1) as int)+1))

    +' & '

    +(SELECT TOP(1) FName FROM #firstname WHERE fnID = fn.fnid%(cast(rand(checksum(newid()))*(@fncount+1) as int)+1))

    ELSE FName

    END),FName) AS FName

    FROM

    #randomFullname r

    INNER JOIN #firstname fn

    ON r.fnid=fn.fnid

    INNER JOIN

    #lastName ln ON r.lnid=ln.lnid

    SELECT * FROM #NamesToUpdate ORDER BY rid

    The 3 variations of SELECT queries:

    SELECT

    ROW_NUMBER() OVER (ORDER BY rid) AS RowID,

    rid,

    LName,

    FName = RTRIM(SUBSTRING(FName,1,CHARINDEX('&',FName,1)-1))

    FROM

    #NamesToUpdate

    WHERE

    rid > 0

    AND FName LIKE '%&%'

    SELECT

    ROW_NUMBER() OVER (ORDER BY rid) AS RowID,

    rid,

    LName,

    FName = RTRIM(SUBSTRING(FName,1,CHARINDEX('&',FName,1)-1))

    FROM

    #NamesToUpdate

    WHERE

    rid > 0

    AND CHARINDEX('&',FName) > 0

    SELECT

    RowID,

    rid,

    LName,

    FName = RTRIM(SUBSTRING(FName,1,CHARINDEX('&',FName,1)-1))

    FROM

    (

    SELECT

    ROW_NUMBER() OVER (PARTITION BY r.rid ORDER BY r.rid) AS RowID,

    r.rid,

    r.LName,

    r.FName

    FROM

    #NamesToUpdate r

    CROSS APPLY

    dbo.DelimitedSplit8k(r.FName,'&') dsk

    WHERE

    r.rid > 0

    ) r1

    WHERE

    RowID = 2

    And UPDATE query versions:

    UPDATE #NamesToUpdate

    SET FName = RTRIM(SUBSTRING(FName,1,CHARINDEX('&',FName,1)-1))

    WHERE

    FName LIKE '%&%'

    UPDATE #NamesToUpdate

    SET FName = RTRIM(SUBSTRING(FName,1,CHARINDEX('&',FName,1)-1))

    WHERE

    CHARINDEX('&',FName) > 0

    UPDATE #NamesToUpdate

    SET FName = RTRIM(SUBSTRING(FName,1,CHARINDEX('&',FName,1)-1))

    WHERE rid IN

    (

    SELECT

    rid

    FROM

    (

    SELECT

    ROW_NUMBER() OVER (PARTITION BY r.rid ORDER BY r.rid) AS RowID,

    r.rid,

    r.LName,

    r.FName

    FROM

    #NamesToUpdate r

    CROSS APPLY

    dbo.DelimitedSplit8k(r.FName,'&') dsk

    WHERE

    r.rid > 0

    ) r1

    WHERE

    RowID = 2

    )

     

  • Having come across this, as a follow up if anyone's interested in benchmark comparisons, here's an easy to read posting:

    http://cc.davelozinski.com/sql/like-vs-substring-vs-leftright-vs-charindex

    which shows the speed differences between LEFT, RIGHT, LIKE, SUBSTRING, and CHARINDEX.

  • _watching (12/1/2015)


    Having come across this, as a follow up if anyone's interested in benchmark comparisons, here's an easy to read posting:

    http://cc.davelozinski.com/sql/like-vs-substring-vs-leftright-vs-charindex

    which shows the speed differences between LEFT, RIGHT, LIKE, SUBSTRING, and CHARINDEX.

    No it doesn't, it's a confusing mess. It's discussed in detail in this thread.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply