• As far as I can see then, you're stuck with one of four options:

    like

    charindex

    patindex

    replace

    The following shows each and some test results. From the looks of it, there isn't much difference between the first three.

    -------------------------------

    -- create temporary tally table

    -------------------------------

    ; with e1(n) as (

    select 1 union all select 1 union all select 1 union all

    select 1 union all select 1 union all select 1 union all

    select 1 union all select 1 union all select 1 union all

    select 1

    )

    , e2(n) as (select 1 from e1 a cross join e1 b)

    , e4(n) as (select 1 from e2 a cross join e2 b)

    select n = isnull(cast((row_number() over (order by (select null))-1) as int),0)

    into #Tally

    from e4

    -- add clustered index to the temporary tally table

    alter table #Tally add constraint pk_qt primary key clustered (n)

    -- (10000 row(s) affected)

    ------------------------------------------

    -- create temporary table with "documents"

    ------------------------------------------

    create table #Documents (DocId int, DocText varchar(4000))

    -- create 10000 "documents"

    insert #Documents select n, 'Have you thought about using a full text index? It might be helpful in identifying records with links.

    You''d need to be careful with which stemming characters you use to avoid breaking the links up though.

    http://technet.microsoft.com/en-us/library/ms345119(v=sql.90).aspx For parsing the links from the data,

    I''d be tempted to handle this outside of sql server.

    You might find that a parsing function built in C# or VB.net and called via CLR performs better.

    http://technet.microsoft.com/en-us/library/ms345136(v=sql.90).aspx

    http://www.pawlowski.cz/2010/09/string-splitting-t-sql-vs-clr/

    Have you thought about using a full text index? It might be helpful in identifying records with links.

    You''d need to be careful with which stemming characters you use to avoid breaking the links up though.

    http://technet.microsoft.com/en-us/library/ms345119(v=sql.90).aspx For parsing the links from the data,

    I''d be tempted to handle this outside of sql server.

    You might find that a parsing function built in C# or VB.net and called via CLR performs better.

    http://technet.microsoft.com/en-us/library/ms345136(v=sql.90).aspx

    http://www.pawlowski.cz/2010/09/string-splitting-t-sql-vs-clr/

    Have you thought about using a full text index? It might be helpful in identifying records with links.

    You''d need to be careful with which stemming characters you use to avoid breaking the links up though.

    http://technet.microsoft.com/en-us/library/ms345119(v=sql.90).aspx For parsing the links from the data,

    I''d be tempted to handle this outside of sql server.

    You might find that a parsing function built in C# or VB.net and called via CLR performs better.

    http://technet.microsoft.com/en-us/library/ms345136(v=sql.90).aspx

    http://www.pawlowski.cz/2010/09/string-splitting-t-sql-vs-clr/

    I''d be tempted to handle this outside of sql server.

    You might find that a parsing function built in C# or VB.net and called via CLR performs better.

    I''d be tempted to handle this outside of sql server.

    You might find that a parsing function built in C# or VB.net and called via CLR performs better.

    I''d be tempted to handle this outside of sql server.

    You might find that a parsing function built in C# or VB.net and called via CLR performs better.

    I''d be tempted to handle this outside of sql server.

    You might find that a parsing function built in C# or VB.net and called via CLR performs better.'

    from #Tally

    -- (10000 row(s) affected)

    set statistics io on

    -- like

    select DocId from #Documents where DocText like '%http%'

    -- charindex

    select DocId from #Documents where charindex('http',DocText) > 0

    -- patindex

    select DocId from #Documents where patindex('%http%',DocText) > 0

    -- substitution

    select DocId from #Documents where len(DocText) <> len(replace(DocText,'http',''))

    /*

    (10000 row(s) affected)

    Table '#Documents'. Scan count 1, logical reads 5000, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (10000 row(s) affected)

    Table '#Documents'. Scan count 1, logical reads 5000, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (10000 row(s) affected)

    Table '#Documents'. Scan count 1, logical reads 5000, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (10000 row(s) affected)

    Table '#Documents'. Scan count 1, logical reads 5000, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    */

    set statistics io off

    ------------

    -- time test

    ------------

    create table #TestResults (i int, l int, c int, p int, s int)

    declare @like int

    , @char int

    , @pat int

    , @sub int

    , @i int

    , @start datetime

    , @end datetime

    set @i = 1

    while @i <=50

    begin

    -- like

    set @start = getdate()

    select DocId from #Documents where DocText like '%http%'

    set @end = getdate()

    set @like = datediff(ms,@start,@end)

    -- charindex

    set @start = getdate()

    select DocId from #Documents where charindex('http',DocText) > 0

    set @end = getdate()

    set @char = datediff(ms,@start,@end)

    -- patindex

    set @start = getdate()

    select DocId from #Documents where patindex('%http%',DocText) > 0

    set @end = getdate()

    set @pat = datediff(ms,@start,@end)

    -- substitution

    set @start = getdate()

    select DocId from #Documents where len(DocText) <> len(replace(DocText,'http',''))

    set @end = getdate()

    set @sub = datediff(ms,@start,@end)

    insert #TestResults select @i, @like, @char, @pat, @sub

    set @i = @i+1

    end

    -- show results

    select l=avg(l),c=avg(c),p=avg(p),s=avg(s)

    from #TestResults

    /*

    lcps

    5445575485641

    */

    Depending on the proportion of documents that have links, I might consider just farming out everything via CLR, regardless of whether a doc contained links or not. You could then just split the input to an array using http: as the delimiter and use the upperbound of the array to determine whether you have multiple array elements (i.e. doc contains links). Of course this relies on you being able to create CLR functions on your server. Although if you can't, you can just do it the other way I guess.