SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Searching for URL contents using CHARINDEX


Searching for URL contents using CHARINDEX

Author
Message
SQL33
SQL33
SSC-Addicted
SSC-Addicted (498 reputation)SSC-Addicted (498 reputation)SSC-Addicted (498 reputation)SSC-Addicted (498 reputation)SSC-Addicted (498 reputation)SSC-Addicted (498 reputation)SSC-Addicted (498 reputation)SSC-Addicted (498 reputation)

Group: General Forum Members
Points: 498 Visits: 225
I need to query the table #URLTbl created by my code below
looking at the field "URL" to spot any occurrences of words
contained in the second table created here #Badwords.

I've succeeded by using the select statement

select * from dbo.#URLTbl
where CHARINDEX('Porn',URL) > 0 or CHARINDEX('naked',URL)
> 1


at finding the words hidden in string in the URL field but cannot figure out how to make this statement use the contents of the #badwords table to return rows containing matches from the #UFLTbl

Thanks for any thoughts!


--To create #URLTbl and #BadWords temp tables run this entire section

IF OBJECT_ID('TempDB..#URLTbl','U') IS NOT NULL DROP TABLE #URLTbl

CREATE TABLE #URLTbl
(MsgDate Varchar(1024),
Username Varchar (1024),
URL Varchar (1024),
Rating Varchar (1024))

--===== Setup any special required conditions especially where dates are concerned
SET DATEFORMAT YMD

INSERT INTO #URLTbl(MsgDate,Username,URL,Rating)
SELECT '2010-11-10','webuser\patr','http://view.atdmt.com/action/porn/MSN_Homepage_Remessaging_111808/nc?a=1','web banners' UNION ALL
SELECT '2010-11-10','webuser\mhoag','http://cdn.unicast.msn.com/assets/A1/N6/M8379/C181757/large.jpg?0.5824313560398167','web search' UNION ALL
SELECT '2010-11-10','webuser\hlavaty3398n','http://content.murder.yieldmanager.com/ak/q.gif','internet services' UNION ALL
SELECT '2010-11-10','webuser\matth','http://assets3b.msnbc.msn.com/rendering/img/bing.png','news' union all
SELECT '2010-11-10','webuser\patr','http://view.atdmt.com/action/kill/MSN_Homepage_Remessaging_111808/nc?a=1','web banners' UNION ALL
SELECT '2010-11-10','webuser\mmhoag','http://cdn.unicast.msn.com/assets/A1/N6/M8379/C181757/large.jpg?0.5824313560398167','web search' UNION ALL
SELECT '2010-11-10','webuser\hmlavaty3398n','http://content.porn.yieldmanager.com/ak/q.gif','internet services' UNION ALL
SELECT '2010-11-10','webuser\mamtth','http://assets3b.msnbc.msn.com/fight/rendering/img/bing.png','news' union all
SELECT '2010-11-10','webuser\patmr','http://view.atdmt.com/action/MSN_Homepage_Remessaging_111808/nc?a=1','web banners' UNION ALL
SELECT '2010-11-10','webuser\mhoamg','http://cdn.unicast.msn.com/assets/A1/N6/M8379/C181757/large.jpg?0.5824313560398167','web search' UNION ALL
SELECT '2010-11-10','webuser\hlavamty3398n','http://content.yieldmanager.com/ak/q.gif','internet services' UNION ALL
SELECT '2010-11-10','webuser\mattmh','http://assets3b.msnbc.msn.com/rendering/img/bing.png','news'union all
SELECT '2010-11-10','webuser\patrm','http://view.atdmt.com/action/MSN_Homepage_Remessaging_111808/nc?a=1','web banners' UNION ALL
SELECT '2010-11-12','webuser\mhoamg','http://cdn.unicast.msn.com/assets/A1/N6/M8379/C181757/large.jpg?0.5824313560398167','web search' UNION ALL
SELECT '2010-11-12','webuser\hlavmaty3398n','http://content.yieldmanager.com/ak/q.gif','internet services' UNION ALL
SELECT '2010-11-12','webuser\mattmh','http://assets3b.msnbc.msn.com/rendering/img/bing.png','news' union all
SELECT '2010-11-15','webuser\patrm','http://view.atdmt.com/action/MSN_Homepage_Remessaging_111808/nc?a=1','web banners' UNION ALL
SELECT '2010-11-15','webuser\mhoamg','http://cdn.unicast.msn.com/assets/A1/N6/M8379/C181757/large.jpg?0.5824313560398167','web search' UNION ALL
SELECT '2010-11-15','webuser\hlavmaty3398n','http://content.yieldmanager.com/ak/q.gif','internet services' UNION ALL
SELECT '2010-11-15','webuser\mattmh','http://assets3b.msnbc.msn.com/rendering/img/bing.png','news'union all
SELECT '2010-11-17','webuser\patr','http://view.atdmt.com/action/porn/MSN_Homepage_Remessaging_111808/nc?a=1','web banners' UNION ALL
SELECT '2010-11-10','webuser\mhoag','http://cdn.unicast.msn.com/assets/A1/N6/M8379/C181757/large.jpg?0.5824313560398167','web search' UNION ALL
SELECT '2010-11-17','webuser\hlavaty3398n','http://content.murder.yieldmanager.com/ak/q.gif','internet services' UNION ALL
SELECT '2010-11-10','webuser\matth','http://assets3b.msnbc.msn.com/rendering/img/bing.png','news' union all
SELECT '2010-11-17','webuser\patr','http://view.atdmt.com/action/kill/MSN_Homepage_Remessaging_111808/nc?a=1','web banners' UNION ALL
SELECT '2010-11-10','webuser\mmhoag','http://cdn.unicast.msn.com/assets/A1/N6/M8379/C181757/large.jpg?0.5824313560398167','web search' UNION ALL
SELECT '2010-11-21','webuser\hmlavaty3398n','http://content.yieldmanager.com/ak/q.gif','internet services' UNION ALL
SELECT '2010-11-21','webuser\mamtth','http://assets3b.msnbc.msn.com/fight/rendering/img/bing.png','news' union all
SELECT '2010-11-21','webuser\patmr','http://view.atdmt.com/action/MSN_Homepage_Remessaging_111808/nc?a=1','web banners' UNION ALL
SELECT '2010-11-10','webuser\mhoamg','http://cdn.unicast.msn.com/assets/A1/N6/M8379/C181757/large.jpg?0.5824313560398167','web search' UNION ALL
SELECT '2010-11-10','webuser\hlavamty3398n','http://content.yieldmanager.com/ak/q.gif','internet services' UNION ALL
SELECT '2010-11-10','webuser\mattmh','http://assets3b.msnbc.msn.com/rendering/img/bing.png','news'union all
SELECT '2010-11-21','webuser\patrm','http://view.atdmt.com/action/MSN_Homepage_Remessaging_111808/nc?a=1','web banners' UNION ALL
SELECT '2010-11-21','webuser\mhoamg','http://cdn.unicast.msn.com/assets/A1/N6/M8379/C181757/large.jpg?0.5824313560398167','web search' UNION ALL
SELECT '2010-11-10','webuser\hlavmaty3398n','http://content.yieldmanager.com/ak/q.gif','internet services' UNION ALL
SELECT '2010-11-10','webuser\mattmh','http://assets3b.msnbc.msn.com/rendering/img/bing.png','news' union all
SELECT '2010-11-10','webuser\patrm','http://view.atdmt.com/action/MSN_Homepage_Remessaging_111808/nc?a=1','web banners' UNION ALL
SELECT '2010-11-10','webuser\mhoamg','http://cdn.unicast.msn.com/assets/A1/N6/M8379/C181757/large.jpg?0.5824313560398167','web search' UNION ALL
SELECT '2010-11-10','webuser\hlavmaty3398n','http://content.yieldmanager.com/ak/q.gif','internet services' UNION ALL
SELECT '2010-11-10','webuser\mattmh','http://assets3b.msnbc.msn.com/rendering/img/bing.png','news'union all
SELECT '2010-11-10','webuser\patr','http://view.atdmt.com/action/porn/MSN_Homepage_Remessaging_111808/nc?a=1','web banners' UNION ALL
SELECT '2010-11-10','webuser\mhoag','http://cdn.unicast.msn.com/assets/A1/N6/M8379/C181757/large.jpg?0.5824313560398167','web search' UNION ALL
SELECT '2010-11-10','webuser\hlavaty3398n','http://content.murder.yieldmanager.com/ak/q.gif','internet services' UNION ALL
SELECT '2010-11-10','webuser\matth','http://assets3b.msnbc.msn.com/rendering/img/bing.png','news' union all
SELECT '2010-11-10','webuser\patr','http://view.atdmt.com/action/kill/MSN_Homepage_Remessaging_111808/nc?a=1','web banners' UNION ALL
SELECT '2010-11-10','webuser\mmhoag','http://cdn.unicast.msn.com/assets/A1/N6/M8379/C181757/large.jpg?0.5824313560398167','web search' UNION ALL
SELECT '2010-11-10','webuser\hmlavaty3398n','http://content.yieldmanager.com/ak/q.gif','internet services' UNION ALL
SELECT '2010-11-10','webuser\mamtth','http://assets3b.msnbc.msn.com/fight/rendering/img/bing.png','news' union all
SELECT '2010-11-10','webuser\patmr','http://view.atdmt.com/action/MSN_Homepage_Remessaging_111808/nc?a=1','web banners' UNION ALL
SELECT '2010-11-10','webuser\mhoamg','http://cdn.unicast.msn.com/assets/A1/N6/M8379/C181757/large.jpg?0.5824313560398167','web search' UNION ALL
SELECT '2010-11-25','webuser\hlavamty3398n','http://content.yieldmanager.com/ak/q.gif','internet services' UNION ALL
SELECT '2010-11-25','webuser\mattmh','http://assets3bomb.msnbc.msn.com/rendering/img/bing.png','news'union all
SELECT '2010-11-25','webuser\patrm','http://view.atdmt.com/action/MSN_Homepage_Remessaging_111808/nc?a=1','web banners' UNION ALL
SELECT '2010-11-25','webuser\mhoamg','http://cdn.unicast.msn.com/assets/A1/N6/M8379/C181757/large.jpg?0.5824313560398167','web search' UNION ALL
SELECT '2010-11-25','webuser\hlavmaty3398n','http://content.yieldmanager.com/ak/q.gif','internet services' UNION ALL
SELECT '2010-11-25','webuser\mattmh','http://assets3b.msnbc.msn.com/rendering/img/bing.png','news' union all
SELECT '2010-11-25','webuser\patrm','http://view.atdmt.com/action/MSN_Homepage_Remessaging_111808/nc?a=1','web banners' UNION ALL
SELECT '2010-11-25','webuser\mhoamg','http://cdn.unicast.msn.com/assets/A1/N6/M8379/C181757/large.jpg?0.5824313560398167','web search' UNION ALL
SELECT '2010-11-25','webuser\hlavmaty3398n','http://content.yieldmanager.com/ak/q.gif','internet services' UNION ALL
SELECT '2010-11-29','webuser\mattmh','http://assets3b.msnbc.msn.com/rendering/img/bing.png','news'union all
SELECT '2010-11-29','webuser\patr','http://view.atdmt.com/action/porn/MSN_Homepage_Remessaging_111808/nc?a=1','web banners' UNION ALL
SELECT '2010-11-29','webuser\mhoag','http://cdn.unicast.msn.com/assets/A1/N6/M8379/C181757/large.jpg?0.5824313560398167','web search' UNION ALL
SELECT '2010-11-29','webuser\hlavaty3398n','http://content.murder.yieldmanager.com/ak/q.gif','internet services' UNION ALL
SELECT '2010-11-29','webuser\matth','http://assets3b.msnbc.msn.com/rendering/img/bing.png','news' union all
SELECT '2010-11-29','webuser\patr','http://view.atdmt.com/action/kill/MSN_Homepage_Remessaging_111808/nc?a=1','web banners' UNION ALL
SELECT '2010-11-29','webuser\mmhoag','http://cdn.unicast.msn.com/assets/A1/N6/M8379/C181757/large.jpg?0.5824313560398167','web search' UNION ALL
SELECT '2010-11-29','webuser\hmlavaty3398n','http://content.yieldmanagergun.com/ak/q.gif','internet services' UNION ALL
SELECT '2010-11-29','webuser\mamtth','http://assets3b.msnbc.msn.com/fight/gun/rendering/img/bing.png','news' union all
SELECT '2010-11-07','webuser\patmr','http://view.atdmt.com/action/MSN_Homepage_Remessaging_111808/nc?a=1','web banners' UNION ALL
SELECT '2010-11-07','webuser\mhoamg','http://cdn.unicast.msn.com/porn/assets/A1/N6/M8379/C181757/large.jpg?0.5824313560398167','web search' UNION ALL
SELECT '2010-11-07','webuser\hlavamty3398n','http://content.yieldmanager.com/ak/q.gif','internet services' UNION ALL
SELECT '2010-11-07','webuser\mattmh','http://assets3b.msnbc.msn.com/rendering/img/bing.png','news'union all
SELECT '2010-11-07','webuser\patrm','http://view.atdmt.com/action/MSN_Homebombpage_Remessaging_111808/nc?a=1','web banners' UNION ALL
SELECT '2010-11-07','webuser\mhoamg','http://cdn.unicast.msn.com/assets/A1/N6/M8379/C181757/large.jpg?0.5824313560398167','web search' UNION ALL
SELECT '2010-11-07','webuser\hlavmaty3398n','http://content.yieldmanager.com/ak/q.gif','internet services' UNION ALL
SELECT '2010-11-07','webuser\mattmh','http://assets3b.msnbc.msn.com/rendering/img/bing.png','news' union all
SELECT '2010-11-07','webuser\patrm','http://view.atdmt.com/action/nude/MSN_Homepage_Remessaging_111808/nc?a=1','web banners' UNION ALL
SELECT '2010-11-07','webuser\mhoamg','http://cdn.unicast.msn.com/assets/A1/N6/M8379/C18naked1757/large.jpg?0.5824313560398167','web search' UNION ALL
SELECT '2010-11-07','webuser\hlavmaty3398n','http://content.bomb.yieldmanager.com/ak/q.gif','internet services' UNION ALL
SELECT '2010-11-10','webuser\mattmh','http://assets3b.msnbc.msn.com/fight/rendering/img/bing.png','news'


--Create Badwords table

IF OBJECT_ID('TempDB..#badwords','U') IS NOT NULL DROP TABLE #badwords

CREATE TABLE #Badwords
(Words nchar(25))

--===== Setup any special required conditions especially where dates are concerned
SET DATEFORMAT YMD

--===== Insert the test data into the test table
INSERT INTO #badwords
(words)
SELECT 'porn ' UNION ALL
SELECT 'nude ' UNION ALL
SELECT 'naked ' UNION ALL
SELECT 'gun ' UNION ALL
SELECT 'bomb ' UNION ALL
SELECT 'kill ' UNION ALL
SELECT 'murder '
--===== Set the identity insert back to normal
-- SET IDENTITY_INSERT #badwords ON

select * from dbo.#badwords
select * from dbo.#URLTbl
Jack Corbett
  Jack Corbett
SSC-Forever
SSC-Forever (46K reputation)SSC-Forever (46K reputation)SSC-Forever (46K reputation)SSC-Forever (46K reputation)SSC-Forever (46K reputation)SSC-Forever (46K reputation)SSC-Forever (46K reputation)SSC-Forever (46K reputation)

Group: General Forum Members
Points: 46117 Visits: 14925
Okay, it took me a few minutes, but here's code that works with your sample:

SELECT
CHARINDEX(RTRIM(B.Words), U.url),
*
FROM
#urltbl U
CROSS JOIN #Badwords AS B
WHERE
CHARINDEX(RTRIM(B.Words), U.url) <> 0
ORDER BY
username,
url



Key points are that I'm using a CROSS JOIN so that each URL is checked against each bad word. This means that if a URL has more than 1 bad word it will be returned once for each bad word it contains. The other is that in your example you are using nchar(25) for the bad words which means that each entry is being padded out to 25 characters (50 bytes) AND you padded the bad words with spaces in the example, so I had to do an RTRIM to remove the padding in order to find matches in the URL since the bad words in the URL are not likely to be followed by spaces.



Jack Corbett

Applications Developer

Don't let the good be the enemy of the best. -- Paul Fleming
At best you can say that one job may be more secure than another, but total job security is an illusion. -- Rod at work

Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
How to Post Performance Problems
Crosstabs and Pivots or How to turn rows into columns Part 1
Crosstabs and Pivots or How to turn rows into columns Part 2
SQL33
SQL33
SSC-Addicted
SSC-Addicted (498 reputation)SSC-Addicted (498 reputation)SSC-Addicted (498 reputation)SSC-Addicted (498 reputation)SSC-Addicted (498 reputation)SSC-Addicted (498 reputation)SSC-Addicted (498 reputation)SSC-Addicted (498 reputation)

Group: General Forum Members
Points: 498 Visits: 225
Wow... so simple when you know how!

Thanks! .. and.. happy new year!
SQL33
SQL33
SSC-Addicted
SSC-Addicted (498 reputation)SSC-Addicted (498 reputation)SSC-Addicted (498 reputation)SSC-Addicted (498 reputation)SSC-Addicted (498 reputation)SSC-Addicted (498 reputation)SSC-Addicted (498 reputation)SSC-Addicted (498 reputation)

Group: General Forum Members
Points: 498 Visits: 225
If I want to add a time parameter restriction criteria to this, being that the timedate fields are all Varchar types, how would I do that? For example... I wanted all records between 12/10/2010 and 1/1/2011 in the field 'MSGDATE' from he URL table ?

Jack Corbett (12/29/2010)
Okay, it took me a few minutes, but here's code that works with your sample:

SELECT
CHARINDEX(RTRIM(B.Words), U.url),
*
FROM
#urltbl U
CROSS JOIN #Badwords AS B
WHERE
CHARINDEX(RTRIM(B.Words), U.url) <> 0
ORDER BY
username,
url



Key points are that I'm using a CROSS JOIN so that each URL is checked against each bad word. This means that if a URL has more than 1 bad word it will be returned once for each bad word it contains. The other is that in your example you are using nchar(25) for the bad words which means that each entry is being padded out to 25 characters (50 bytes) AND you padded the bad words with spaces in the example, so I had to do an RTRIM to remove the padding in order to find matches in the URL since the bad words in the URL are not likely to be followed by spaces.

Jack Corbett
  Jack Corbett
SSC-Forever
SSC-Forever (46K reputation)SSC-Forever (46K reputation)SSC-Forever (46K reputation)SSC-Forever (46K reputation)SSC-Forever (46K reputation)SSC-Forever (46K reputation)SSC-Forever (46K reputation)SSC-Forever (46K reputation)

Group: General Forum Members
Points: 46117 Visits: 14925
Just add that to the Where Clause.

SELECT
CHARINDEX(RTRIM(B.Words), U.url),
*
FROM
#urltbl U
CROSS JOIN #Badwords AS B
WHERE
MSGDATE >= '12/10/2010' AND
/* using the next day because if you want all of 1/1 and there are any times
included you need to do less than the next day */
MSGDATE < '1/2/2011' AND
CHARINDEX(RTRIM(B.Words), U.url) <> 0
ORDER BY
username,
url





Jack Corbett

Applications Developer

Don't let the good be the enemy of the best. -- Paul Fleming
At best you can say that one job may be more secure than another, but total job security is an illusion. -- Rod at work

Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
How to Post Performance Problems
Crosstabs and Pivots or How to turn rows into columns Part 1
Crosstabs and Pivots or How to turn rows into columns Part 2
SQL33
SQL33
SSC-Addicted
SSC-Addicted (498 reputation)SSC-Addicted (498 reputation)SSC-Addicted (498 reputation)SSC-Addicted (498 reputation)SSC-Addicted (498 reputation)SSC-Addicted (498 reputation)SSC-Addicted (498 reputation)SSC-Addicted (498 reputation)

Group: General Forum Members
Points: 498 Visits: 225
Yes.... I had tried that, but I was searching on a field containing data/time data, but the datatype it was set to was Varchar (30) .. which caused the search to bomb.

This brings up a datatype question... I'm searching here on field "URL" and field "time" and want to create an index (clustered) for faster searching, but I think I need to minimize the size of the URL field to enable more than one index if possible. What type and sized datatype would be best to contain URL data and allow efficient searching, while perhaps leaving enough space to create an index on the date field "TIME"? I'm thinking Varchar (1024) is bigger than it needs to be, but also wonder if "TEXT" would allow for any decent searching... ?

My goal is to be able to search on the time field (datatype "datetime") and the URL field (current datatype Varchar 1024).


Jack Corbett (1/3/2011)
Just add that to the Where Clause.

SELECT
CHARINDEX(RTRIM(B.Words), U.url),
*
FROM
#urltbl U
CROSS JOIN #Badwords AS B
WHERE
MSGDATE >= '12/10/2010' AND
/* using the next day because if you want all of 1/1 and there are any times
included you need to do less than the next day */
MSGDATE < '1/2/2011' AND
CHARINDEX(RTRIM(B.Words), U.url) <> 0
ORDER BY
username,
url


Jack Corbett
  Jack Corbett
SSC-Forever
SSC-Forever (46K reputation)SSC-Forever (46K reputation)SSC-Forever (46K reputation)SSC-Forever (46K reputation)SSC-Forever (46K reputation)SSC-Forever (46K reputation)SSC-Forever (46K reputation)SSC-Forever (46K reputation)

Group: General Forum Members
Points: 46117 Visits: 14925
I didn't spend anytime looking at your data types, but now that I have I do have some suggestions.

1. If the column contains dates then it should be one of the date data types. If you keep it a character data type you are only asking for trouble. If you are forced to use varchar() for it then you should be doing date validation before you insert/update that column AND I suggest a computed column that IS a date data type or a view that converts that column to a date data type. Ideally you'll put the correct data type on the column.
2. You definitely do NOT want to use TEXT data type because it is a BLOP data type, not character, AND it has been deprecated by Microsoft.
3. An index on msgdate while it is a character column won't give you the results you think as it will be sorting in alphbetical order NOT date order. So 1/1/01 will come AFTER 01/2/02.
4. An index on the url column will not help this query. Because you have to use a function, CHARINDEX, on the column SQL Server MUST do a scan of every row. You might be better off looking into Full Text indexing if you need to speed up this query. I am NOT a full text expert, so I can't really help or tell you if it is the right technology for this problem.

I hope this has helped, a little.



Jack Corbett

Applications Developer

Don't let the good be the enemy of the best. -- Paul Fleming
At best you can say that one job may be more secure than another, but total job security is an illusion. -- Rod at work

Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
How to Post Performance Problems
Crosstabs and Pivots or How to turn rows into columns Part 1
Crosstabs and Pivots or How to turn rows into columns Part 2
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search