May 30, 2006 at 6:50 pm
I have a database with two fields: ID and SVCS.
SVCS contains a comma delimited list.
So records on the dabase might look like this:
ID - SVCS
1 - a,c,e,g
2 - b,d,f,h
3 - a,b,c
4 - f,g,h
5 - a,b,c,e
To search this DB a user checks a series of checkboxes on a form, for example:
SVCS: a[x] b[x] c[x] d[ ] e[x] f[ ] g[ ] h[ ]
I want to pull any records where any of the items selected by the user is found in the SVCS field.
In this example it would return:
1 - a,c,e,g (3 matches: a,c,and e)
2 - b,d,f,h (1 match: b)
3 - a,b,c (2 matches: a,c)
5 - a,b,c,e (4 matches: a,b,c,and e)
Furthrermore:
How do I order the results by the number of matches?
May 30, 2006 at 10:53 pm
Really, really bad idea to have the table in the format you do... highly denormalized and for more reasons than I can count, you will have more problems with it than you can count. Folks like Joe Celko would have a field day with this and I wouldn't blame them a bit.
As an additional note, you might want to name the ID column as something a bit more meaningful, as well. Yep, I know... this is just an example and it might already be named something else... just wanted to remind you.
Here's how to fix the normalization of the table into a new table... I use temp tables for this example... you'd need to do a little work to make it fit your production tables...
--===== If the temporary demonstration tables exist, drop them
IF OBJECT_ID('TempDB..#UsersTable') IS NOT NULL
DROP TABLE #UsersTable
IF OBJECT_ID('TempDB..#NewTable') IS NOT NULL
DROP TABLE #NewTable
GO
--===== Simulate the users table
CREATE TABLE #UsersTable
(
ID INT,
Svcs VARCHAR(20)
)
--===== Populate the sample users table with the data given
INSERT INTO #UsersTable
(ID,Svcs)
SELECT 1,'a,c,e,g' UNION ALL
SELECT 2,'b,d,f,h' UNION ALL
SELECT 3,'a,b,c' UNION ALL
SELECT 4,'f,g,h' UNION ALL
SELECT 5,'a,b,c,e'
--===== Create a more normalized table
CREATE TABLE #NewTable
(
ID INT NOT NULL,
Svc CHAR(1) NOT NULL
)
--===== Demonstrate how to convert the highly denormalized users table
-- into the normalized table.
INSERT INTO #NewTable
(ID,Svc)
SELECT y.ID,
LTRIM(SUBSTRING(y.Svcs, t.Number+1, CHARINDEX(',', y.Svcs, t.Number + 1) - t.Number - 1)) AS Svc
FROM Master.dbo.spt_Values t
INNER JOIN
(SELECT ID,','+Svcs+',' AS Svcs FROM dbo.#UsersTable) y
ON SUBSTRING(y.Svcs, t.Number, 1) = ','
AND t.Number < LEN(y.Svcs)
WHERE t.Name IS NULL
--===== Add a composite primary key to the new table
-- to ensure normalicy in the future.
ALTER TABLE #NewTable
ADD CONSTRAINT PK_NewTable_ID_Svc PRIMARY KEY CLUSTERED (ID,Svc)
--===== Show the new table
SELECT *
FROM #NewTable
Then, all you need to do is use this stored procedure to get your data... it won't be in the form you asked for but it's more correct...
CREATE PROCEDURE FindMatchesN
--===== Declare the input parameters
@a CHAR(1) = NULL, --Nulls allow the parameters to be optional
@b-2 CHAR(1) = NULL,
@c CHAR(1) = NULL,
@d CHAR(1) = NULL,
@e CHAR(1) = NULL,
@f CHAR(1) = NULL,
@g CHAR(1) = NULL,
@h CHAR(1) = NULL
AS
--===== Suppress the auto-display of rowcounts to ensure that
-- only one record set is created.
SET NOCOUNT ON
--===== Return all the matches and counts
SELECT nt.ID,nt.Svc,d.Matches
FROM #NewTable nt,
(--Derived table "d" finds the ID's we want
SELECT ID, COUNT(*) AS Matches
FROM #NewTable
WHERE Svc IN (@a,@b,@c,@d,@e,@f,@g,@h)
GROUP BY ID
) d --End derived table "d"
WHERE nt.ID = d.ID
ORDER BY d.Matches DESC ,nt.ID ASC, nt.Svc ASC
GO
You can run the proc from your app using something like this...
EXEC FindMatchesN @a='a',@b='b',@c='c'
In case you can't or won't change the table to something more normalized as I suggested, do everything I said so far except rewrite the proc thusly...
CREATE PROCEDURE FindMatchesD
--===== Declare the input parameters
@a CHAR(1) = NULL, --Nulls allow the parameters to be optional
@b-2 CHAR(1) = NULL,
@c CHAR(1) = NULL,
@d CHAR(1) = NULL,
@e CHAR(1) = NULL,
@f CHAR(1) = NULL,
@g CHAR(1) = NULL,
@h CHAR(1) = NULL
AS
--===== Suppress the auto-display of rowcounts to ensure that
-- only one record set is created.
SET NOCOUNT ON
--===== Return all the matches and counts
SELECT u.ID,u.Svcs,d.Matches
FROM #UsersTable u,
(--Derived table "d" finds the ID's we want
SELECT ID, COUNT(*) AS Matches
FROM #NewTable
WHERE Svc IN (@a,@b,@c,@d,@e,@f,@g,@h)
GROUP BY ID
) d --End derived table "d"
WHERE u.ID = d.ID
ORDER BY d.Matches DESC ,u.ID ASC
GO
...and execute it nearly the same way (only the name changed)...
EXEC FindMatchesD @a='a',@b='b',@c='c'
Do notice that I used the undocumented spt_Values table to get my numbers to do the split of the CSV Svcs column... you really need to make your own "Tally" or "Numbers" table and, if you search for it, there's plenty of discussion on how to do that on this forum. If you really don't understand what I mean on this, post back.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 31, 2006 at 9:53 am
Wow! What a comprehensive reply. Nice work, Jeff
You must either have far too much time on you hands or you are a really, really nice guy. I'm sure it's the latter.
Ryan Randall
Solutions are easy. Understanding the problem, now, that's the hard part.
May 31, 2006 at 10:04 am
Jeff,
Thank you for all of your help, it is truly an amazing amount of information! I am a SQL newbie but I was able to adapt your thourougly commented scripts and normalize my real table, and get the stored procedure to work against it.
Going back to the sample data/tables when I run the stored procedure:
EXEC FindMatchesN @a='e',@b='b',@c='c'
I get:
ID SVC Matches
3 a 2
3 b 2
3 c 2
5 a 2
5 b 2
5 c 2
5 e 2
1 a 1
1 c 1
1 e 1
1 g 1
Close, but it is probably missing a group by clause or maybe that is what you refer to when you say "you really need to make your own "Tally" or "Numbers" table".
After all of the work you put into explaing this all to me I hate to ask for more but, could you explain this Tally" or "Numbers" table issue, or point me in the right direction?
I've searched the forums for "Tally tables" and found nothing. Again thanks so much for all your help.
June 2, 2006 at 10:00 pm
DogBot,
Thanks for the feedback and sorry for my absence... been doing the OT thing alot, lately.
When I run...
EXEC FindMatchesN @a='a',@b='b',@c='c'
...against the data in my posting (which I was pretty sure matched your post) I get...
ID Svc Matches
----------- ---- -----------
3 a 3
3 b 3
3 c 3
5 a 3
5 b 3
5 c 3
5 e 3
1 a 2
1 c 2
1 e 2
1 g 2
2 b 1
2 d 1
2 f 1
2 h 1
...which looks correct to me (or may be due to a lack of sleep)... can you explain what you mean by "Close, but is probably missing a Group By"? Perhaps, post the code you are using? Thanks.
Ryan,
Thanks for the very nice compliment... I neither have time on my hands nor am I much of a nice guy
(just kidding)... I just remember when I got started and also remember not having forums like this or people like yourself to be able to turn to... so, when I see someone in really deep Kimchie, especially a "newbie", I try to go the extra mile... kinda like you and a lot of other good folks on this forum.
Besides, it's a heck of a lot of fun watching the lights come on.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 2, 2006 at 10:30 pm
Dang... didn't read down far enough...
DogBot... here's how to create a Tally table without using a cursor or loop...
/****************************************************************************************
THIS IS A TABLE CREATION SCRIPT
Purpose:
This script creates a "Tally" or "Numbers" table consisting of a single column with the
numbers 1 through 9999 in it. The table is used to create "set-based loops" for
splitting CSV's as well as other high-speed functions and to provide other functionality
such as creating ranges of dates (about 27 years worth).
***************************************************************************************/
--===== Create and populate the Tally table on the fly
PRINT 'Creating Tally table...'
SELECT TOP 9999
IDENTITY(INT,1,1) AS N
INTO dbo.Tally
FROM Master.dbo.SysComments sc1,
Master.dbo.SysComments sc2
--===== Add a Primary Key to maximize performance
-- Fill factor = 100 on static tables because no changes will occur.
-- Full fill adds even more performance.
PRINT 'Adding primary key...'
ALTER TABLE dbo.Tally
ADD CONSTRAINT PK_Tally_N PRIMARY KEY CLUSTERED (N)
--===== Ok, let other folks use it
PRINT 'Applying "permissions"...'
GRANT SELECT ON dbo.Tally TO PUBLIC
PRINT 'Done.'
GO
Here's some other searches to do on this forum and on Google...
Aux Numbers
Auxiliary Numbers
Auxiliary Numbers
Auxiliary Table
Numbers Table
Date Table
Calendar Table
There are dozens of uses for a Tally (or Numbers) table and a Calendar table and they all help you avoid cursors and loops (although some loops are faster than a join to these tables but most are not).
--Jeff Moden
Change is inevitable... Change for the better is not.
June 5, 2006 at 3:38 am
DogBot - You can read about normalisation here...
http://www.datamodel.org/NormalizationRules.html
Ryan Randall
Solutions are easy. Understanding the problem, now, that's the hard part.
June 5, 2006 at 7:19 am
I think the confusion arises here:
In case you can't change your schema in the short term, here's some code which demonstrates how to implement a temporary solution to the matching problem. An indexing strategy is suggested, but of course may need to be modified according your particular circumstances.
Just to preempt any responses from Outraged of Somewhere-off-the-M4, I'm not disputing any of the points made in the other posts, and this is only intended as a short-term fix!
@sql = 'select ''' + replace(replace(@chrs, char(39), char(39)+char(39)),',',''' union all select ''') + ''''
@chr(chr) exec( @sql)
@svcs(rank, svcsid, svcs) select 1, id, chr from @chr
@chrcount = @@rowcount, @i = 2
@i <= @chrcount
insert @svcs(rank,svcsid, svcs)
tbl_denorm(svcs)
svcs from @svcs cross join (select 1 i union select 2 union select 3) i
cast(count(*) as varchar) + ' records: combinations including ' + min(svcs) + ' and '+ max(svcs)
tbl_denorm
@sarg table(sarg varchar(50))
@sarg(sarg) select '%,a,%' union select '%,d,%' union select '%,g,%' union select '%,m,%'
getdate() dt
d.ID, d.svcs, count(*) rank
tbl_denorm d join @sarg s
by d.ID, d.svcs
by rank desc, d.svcs
getdate() dt
table tbl_denorm
Tim Wilkinson
"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant
June 5, 2006 at 9:00 am
And here's another way to do what you shouldn't...
--data
declare @t table (ID int, SVCS varchar(20))
insert @t
select 1, 'a,c,e,g'
union all select 2, 'b,d,f,h'
union all select 3, 'a,b,c'
union all select 4, 'f,g,h'
union all select 5, 'a,b,c,e'
--input
declare @s-2 varchar(20)
set @s-2 = 'a,b,c,e'
--calculation
declare @numbers table (i int identity(1, 1), x bit)
insert @numbers select top 20 null from master.dbo.syscolumns
select ID, SVCS, count(*) as count
from @numbers n inner join @t t on
',' + SVCS + ',' like '%,' + left(substring(@s, i, 20), charindex(',', substring(@s, i, 20) + ',')-1) + ',%'
where substring(',' + @s-2, i, 1) = ','
group by ID, SVCS
order by count desc
/*results
ID SVCS count
----------- -------------------- -----------
5 a,b,c,e 4
1 a,c,e,g 3
3 a,b,c 3
2 b,d,f,h 1
*/
Ryan Randall
Solutions are easy. Understanding the problem, now, that's the hard part.
June 5, 2006 at 5:41 pm
OK, As you all know I am a SQL Newbie so my head is just spinning with all of the information, there is so much here that is COMPLETELY NEW AND FOREING to me...it's dizzing. I did embark on a serious normalization of this table (Me thinks I know what normalization means now)I had to create 12 new tables as there where many fields were data was stored as comma delimited strings. The SQL has grown tremendously complicated specially since most of what I am doing is cutting and pasting and adapting from your wonderful examples without really understanding what it is that I am doing exactly. By trail and error I am getting the right results though, but if anyone asks me ever to explain what I have done I could not.
Of course the kicker is that Ryan comes in and suggests I could have cheated a little not have had to do all of the normalization! 😉
Anyhow. Thanks to all of you, for your tremendous patience and selfless efforts in helping me(specially to Jeff)
DOG
June 5, 2006 at 8:02 pm
Thanks dee-oh-gee... little hint... document as you go and you'll never get behind.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 11 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply