Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Trying to iteratively use PATINDEX commands to comb through a large amount of text


Trying to iteratively use PATINDEX commands to comb through a large amount of text

Author
Message
huston.dunlap
huston.dunlap
Valued Member
Valued Member (59 reputation)Valued Member (59 reputation)Valued Member (59 reputation)Valued Member (59 reputation)Valued Member (59 reputation)Valued Member (59 reputation)Valued Member (59 reputation)Valued Member (59 reputation)

Group: General Forum Members
Points: 59 Visits: 155
Here's the scenario.

I have a 10000 record table where each row contains a large amount of text. I want to do a PATINDEX search to determine where a specific word occurs (let's say "dog" as an example). Since the patindex command only finds the first instance, I want to iterate using substrings to keep finding additional hits.

Right now this is the method I know how to do it (there may be a few little details missing, its mainly to show my thought process):

1) Create a field that contains my current patindex position.
2) Run the patindex command, update the table with the current patindex position for each row.
3) Store the patindex position in another table (or wherever, not really important for this exercise).
4) Create a dynamic sql query to do a substring on the text, starting at the current patindex position +1.
5) Run another patindex command on this new substring. Update the current patindex position with the new position.
6) Rinse and Repeat.


This method will work, but I was wondering if anyone had a cleaner and better way to do this (perhaps that didn't involve dynamic sql).
diamondgm
diamondgm
SSC-Enthusiastic
SSC-Enthusiastic (149 reputation)SSC-Enthusiastic (149 reputation)SSC-Enthusiastic (149 reputation)SSC-Enthusiastic (149 reputation)SSC-Enthusiastic (149 reputation)SSC-Enthusiastic (149 reputation)SSC-Enthusiastic (149 reputation)SSC-Enthusiastic (149 reputation)

Group: General Forum Members
Points: 149 Visits: 938
First things that pop in to my head:
1. CTE to identify occurrences of the desired string part
2. Table valued function using the CTE and then using CROSS APPLY

Perhaps others have better ideas?
Jason Norsworthy
Jason Norsworthy
Grasshopper
Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)

Group: General Forum Members
Points: 24 Visits: 255
I'm dreaming up a solution using tally tables... What is your desired output? Number of occurrences, or indexes of occurrences?
Jason Norsworthy
Jason Norsworthy
Grasshopper
Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)

Group: General Forum Members
Points: 24 Visits: 255
Ok here we go...
You'll need to build a tally table. Jeff Moden has an excellent article about tally tables here.

After you have a tally table, create a test table with test data: (or just plug in your own table I suppose)

select top 10000 identity(int,1,1) as id, Dog
into DogTable
from
(
select 'dogcatdogdogbirdape' as Dog
union all
select 'fishbirdnotadogoopsthatwasadog'
union all
select 'caninecanislupisdomesticus'
)x
cross join
(select sc1.name from master.dbo.syscolumns sc1
cross join
master.dbo.syscolumns sc2
) y




Now for the solution:
select id, count(id)
from
(
select d.id, d.dog, substring(d.dog, case when t.N > len(d.dog) then len(d.dog) else t.n end, 3) as test, n as foundindex
from dogtable d
cross join
tally t
where t.n < (select max(len(dog)) from dogtable)
and substring(d.dog, case when t.N > len(d.dog) then len(d.dog) else t.n end, 3) = 'dog'
)x
group by id



Run the inner view if you just want a list of indexes of the tested string.
This could be easily turned into a table valued function, just read the length of your input string into the last parameter of substring
CleverSQLUserID
CleverSQLUserID
Forum Newbie
Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)

Group: General Forum Members
Points: 8 Visits: 282
Sorry for necro'ing an old thread, but I'm trying to do something similar. The difference is that I'm looking for multiple strings (matching a pattern) of varying lengths. I've tried to modify the "dog" query, but am falling short of accomplishing what I want to accomplish. Can anyone help shed light?

FYI, I'm trying to search through stored procs and other objects for instances of linked servers, which follow a standard of "LS-%".

select object_name(s.id) Object, substring(s.text, n, charindex(' ', text, n) - n) as Target
from syscomments s
cross join
tally t
where t.n < (select max(len(text)) from syscomments)
and lower(substring(s.text, case when t.N > len(s.text) then len(s.text) else t.n end, case when t.N > len(s.text) then 5 else len(s.text) - n end)) like 'ls-__'
order by object, target


ScottPletcher
ScottPletcher
Hall of Fame
Hall of Fame (3.9K reputation)Hall of Fame (3.9K reputation)Hall of Fame (3.9K reputation)Hall of Fame (3.9K reputation)Hall of Fame (3.9K reputation)Hall of Fame (3.9K reputation)Hall of Fame (3.9K reputation)Hall of Fame (3.9K reputation)

Group: General Forum Members
Points: 3946 Visits: 6686
That code's written for SQL 2000; I suggest completely ignoring it.

Instead, the code below should give you a list of object names and their full text reasonably quickly.

If you want to add in the part about showing the specific text that matches, let me know. Just be aware that'll be slower -- perhaps much slower?!



IF OBJECT_ID('tempdb..#search_strings') IS NOT NULL
DROP TABLE #search_strings
CREATE TABLE #search_strings (
search_id int IDENTITY(1, 1) NOT NULL,
search_string nvarchar(40) NOT NULL
)
INSERT INTO #search_strings
SELECT 'linked_server_name1'
--UNION ALL SELECT 'linked_server_name2'
--UNION ALL ...

SELECT
OBJECT_NAME(object_id) AS Object_Name,
search_string,
(SELECT definition FROM sys.sql_modules WHERE object_id = derived.object_id) AS Object_Text
FROM (
SELECT DISTINCT
sm.object_id,
ss.search_string
FROM sys.sql_modules sm
CROSS JOIN #search_strings AS ss
WHERE
CAST(OBJECTPROPERTYEX(sm.object_id, 'IsProcedure') AS int) = 1 AND
sm.definition LIKE '%' + ss.search_string + '%'
) AS derived
ORDER BY
Object_Name




SQL DBA,SQL Server MVP('07, '08, '09)

Prosecutor James Blackburn, in closing argument in the "Fatal Vision" murders trial: "If in the future, you should cry a tear, cry one for them [the murder victims]. If in the future, you should say a prayer, say one for them. And if in the future, you should light a candle, light one for them."
CleverSQLUserID
CleverSQLUserID
Forum Newbie
Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)

Group: General Forum Members
Points: 8 Visits: 282
Thanks for the code! It works great if I know what I'm searching for, but unfortunately to be useful in my current situation it would have to return the actual match. So I could search for "ls-" and it would return "ls-app1" and "ls-app2", etc.
ScottPletcher
ScottPletcher
Hall of Fame
Hall of Fame (3.9K reputation)Hall of Fame (3.9K reputation)Hall of Fame (3.9K reputation)Hall of Fame (3.9K reputation)Hall of Fame (3.9K reputation)Hall of Fame (3.9K reputation)Hall of Fame (3.9K reputation)Hall of Fame (3.9K reputation)

Group: General Forum Members
Points: 3946 Visits: 6686
The code uses LIKE to compare the string, so just construct your string accordingly.

For example, you could use 'ls-' as the search string. Or, if you want a closer match, then:

ls-app[0-9]

or something similar.

I'll add code to see the actual part(s) of the text that match if you want.

SQL DBA,SQL Server MVP('07, '08, '09)

Prosecutor James Blackburn, in closing argument in the "Fatal Vision" murders trial: "If in the future, you should cry a tear, cry one for them [the murder victims]. If in the future, you should say a prayer, say one for them. And if in the future, you should light a candle, light one for them."
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)

Group: General Forum Members
Points: 45143 Visits: 39923
ScottPletcher (8/13/2013)
That code's written for SQL 2000; I suggest completely ignoring it.


I wouldn't ignore it because it was written for SQL Server 2000... I'd ignore it because it solves the wrong problem for the latest request.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
CleverSQLUserID
CleverSQLUserID
Forum Newbie
Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)

Group: General Forum Members
Points: 8 Visits: 282
ScottPletcher (8/14/2013)
The code uses LIKE to compare the string, so just construct your string accordingly.
<snip>
I'll add code to see the actual part(s) of the text that match if you want.</snip>


Yes, please!
I don't even know where to begin for that...
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search