How to order records

Question

How to order records

DogBot

SSC Journeyman

Points: 89
More actions
May 30, 2006 at 6:50 pm

#90066

I have a database with two fields: ID and SVCS.
SVCS contains a comma delimited list.
So records on the dabase might look like this:
ID - SVCS
1 - a,c,e,g
2 - b,d,f,h
3 - a,b,c
4 - f,g,h
5 - a,b,c,e
To search this DB a user checks a series of checkboxes on a form, for example:
SVCS: a[x] b[x] c[x] d[ ] e[x] f[ ] g[ ] h[ ]
I want to pull any records where any of the items selected by the user is found in the SVCS field.
In this example it would return:
1 - a,c,e,g (3 matches: a,c,and e)
2 - b,d,f,h (1 match: b)
3 - a,b,c (2 matches: a,c)
5 - a,b,c,e (4 matches: a,b,c,and e)
Furthrermore:
How do I order the results by the number of matches?

Viewing 11 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004683 More actions · Answer 1

Really, really bad idea to have the table in the format you do... highly denormalized and for more reasons than I can count, you will have more problems with it than you can count. Folks like Joe Celko would have a field day with this and I wouldn't blame them a bit.

As an additional note, you might want to name the ID column as something a bit more meaningful, as well. Yep, I know... this is just an example and it might already be named something else... just wanted to remind you.

Here's how to fix the normalization of the table into a new table... I use temp tables for this example... you'd need to do a little work to make it fit your production tables...

--===== If the temporary demonstration tables exist, drop them

IF OBJECT_ID('TempDB..#UsersTable') IS NOT NULL

DROP TABLE #UsersTable

IF OBJECT_ID('TempDB..#NewTable') IS NOT NULL

DROP TABLE #NewTable

GO

--===== Simulate the users table

CREATE TABLE #UsersTable

(

ID INT,

Svcs VARCHAR(20)

)

--===== Populate the sample users table with the data given

INSERT INTO #UsersTable

(ID,Svcs)

SELECT 1,'a,c,e,g' UNION ALL

SELECT 2,'b,d,f,h' UNION ALL

SELECT 3,'a,b,c' UNION ALL

SELECT 4,'f,g,h' UNION ALL

SELECT 5,'a,b,c,e'

--===== Create a more normalized table

CREATE TABLE #NewTable

(

ID INT NOT NULL,

Svc CHAR(1) NOT NULL

)

--===== Demonstrate how to convert the highly denormalized users table

-- into the normalized table.

INSERT INTO #NewTable

(ID,Svc)

SELECT y.ID,

LTRIM(SUBSTRING(y.Svcs, t.Number+1, CHARINDEX(',', y.Svcs, t.Number + 1) - t.Number - 1)) AS Svc

FROM Master.dbo.spt_Values t

INNER JOIN

(SELECT ID,','+Svcs+',' AS Svcs FROM dbo.#UsersTable) y

ON SUBSTRING(y.Svcs, t.Number, 1) = ','

AND t.Number < LEN(y.Svcs)

WHERE t.Name IS NULL

--===== Add a composite primary key to the new table

-- to ensure normalicy in the future.

ALTER TABLE #NewTable

ADD CONSTRAINT PK_NewTable_ID_Svc PRIMARY KEY CLUSTERED (ID,Svc)

--===== Show the new table

SELECT *

FROM #NewTable

Then, all you need to do is use this stored procedure to get your data... it won't be in the form you asked for but it's more correct...

CREATE PROCEDURE FindMatchesN

--===== Declare the input parameters

@a CHAR(1) = NULL, --Nulls allow the parameters to be optional

@b-2 CHAR(1) = NULL,

@c CHAR(1) = NULL,

@d CHAR(1) = NULL,

@e CHAR(1) = NULL,

@f CHAR(1) = NULL,

@g CHAR(1) = NULL,

@h CHAR(1) = NULL

AS

--===== Suppress the auto-display of rowcounts to ensure that

-- only one record set is created.

SET NOCOUNT ON

--===== Return all the matches and counts

SELECT nt.ID,nt.Svc,d.Matches

FROM #NewTable nt,

(--Derived table "d" finds the ID's we want

SELECT ID, COUNT(*) AS Matches

FROM #NewTable

WHERE Svc IN (@a,@b,@c,@d,@e,@f,@g,@h)

GROUP BY ID

) d --End derived table "d"

WHERE nt.ID = d.ID

ORDER BY d.Matches DESC ,nt.ID ASC, nt.Svc ASC

GO

You can run the proc from your app using something like this...

EXEC FindMatchesN @a='a',@b='b',@c='c'

In case you can't or won't change the table to something more normalized as I suggested, do everything I said so far except rewrite the proc thusly...

CREATE PROCEDURE FindMatchesD

--===== Declare the input parameters

@a CHAR(1) = NULL, --Nulls allow the parameters to be optional

@b-2 CHAR(1) = NULL,

@c CHAR(1) = NULL,

@d CHAR(1) = NULL,

@e CHAR(1) = NULL,

@f CHAR(1) = NULL,

@g CHAR(1) = NULL,

@h CHAR(1) = NULL

AS

--===== Suppress the auto-display of rowcounts to ensure that

-- only one record set is created.

SET NOCOUNT ON

--===== Return all the matches and counts

SELECT u.ID,u.Svcs,d.Matches

FROM #UsersTable u,

(--Derived table "d" finds the ID's we want

SELECT ID, COUNT(*) AS Matches

FROM #NewTable

WHERE Svc IN (@a,@b,@c,@d,@e,@f,@g,@h)

GROUP BY ID

) d --End derived table "d"

WHERE u.ID = d.ID

ORDER BY d.Matches DESC ,u.ID ASC

GO

...and execute it nearly the same way (only the name changed)...

EXEC FindMatchesD @a='a',@b='b',@c='c'

Do notice that I used the undocumented spt_Values table to get my numbers to do the split of the CSV Svcs column... you really need to make your own "Tally" or "Numbers" table and, if you search for it, there's plenty of discussion on how to do that on this forum. If you really don't understand what I mean on this, post back.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

RyanRandall SSChampion Points: 13623 More actions · Answer 2

Wow! What a comprehensive reply. Nice work, Jeff

You must either have far too much time on you hands or you are a really, really nice guy. I'm sure it's the latter.

Ryan Randall

Solutions are easy. Understanding the problem, now, that's the hard part.

DogBot SSC Journeyman Points: 89 More actions · Answer 3

Jeff,

Thank you for all of your help, it is truly an amazing amount of information! I am a SQL newbie but I was able to adapt your thourougly commented scripts and normalize my real table, and get the stored procedure to work against it.

Going back to the sample data/tables when I run the stored procedure:

EXEC FindMatchesN @a='e',@b='b',@c='c'

I get:

ID SVC Matches

3 a 2

3 b 2

3 c 2

5 a 2

5 b 2

5 c 2

5 e 2

1 a 1

1 c 1

1 e 1

1 g 1

Close, but it is probably missing a group by clause or maybe that is what you refer to when you say "you really need to make your own "Tally" or "Numbers" table".

After all of the work you put into explaing this all to me I hate to ask for more but, could you explain this Tally" or "Numbers" table issue, or point me in the right direction?

I've searched the forums for "Tally tables" and found nothing. Again thanks so much for all your help.

Jeff Moden SSC Guru Points: 1004683 More actions · Answer 4

DogBot,

Thanks for the feedback and sorry for my absence... been doing the OT thing alot, lately.

When I run...

EXEC FindMatchesN @a='a',@b='b',@c='c'

...against the data in my posting (which I was pretty sure matched your post) I get...

ID Svc Matches

----------- ---- -----------

3 a 3

3 b 3

3 c 3

5 a 3

5 b 3

5 c 3

5 e 3

1 a 2

1 c 2

1 e 2

1 g 2

2 b 1

2 d 1

2 f 1

2 h 1

...which looks correct to me (or may be due to a lack of sleep)... can you explain what you mean by "Close, but is probably missing a Group By"? Perhaps, post the code you are using? Thanks.

Ryan,

Thanks for the very nice compliment... I neither have time on my hands nor am I much of a nice guy (just kidding)... I just remember when I got started and also remember not having forums like this or people like yourself to be able to turn to... so, when I see someone in really deep Kimchie, especially a "newbie", I try to go the extra mile... kinda like you and a lot of other good folks on this forum.

Besides, it's a heck of a lot of fun watching the lights come on.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004683 More actions · Answer 5

Dang... didn't read down far enough...

DogBot... here's how to create a Tally table without using a cursor or loop...

/****************************************************************************************

THIS IS A TABLE CREATION SCRIPT

Purpose:

This script creates a "Tally" or "Numbers" table consisting of a single column with the

numbers 1 through 9999 in it. The table is used to create "set-based loops" for

splitting CSV's as well as other high-speed functions and to provide other functionality

such as creating ranges of dates (about 27 years worth).

***************************************************************************************/

--===== Create and populate the Tally table on the fly

PRINT 'Creating Tally table...'

SELECT TOP 9999

IDENTITY(INT,1,1) AS N

INTO dbo.Tally

FROM Master.dbo.SysComments sc1,

Master.dbo.SysComments sc2

--===== Add a Primary Key to maximize performance

-- Fill factor = 100 on static tables because no changes will occur.

-- Full fill adds even more performance.

PRINT 'Adding primary key...'

ALTER TABLE dbo.Tally

ADD CONSTRAINT PK_Tally_N PRIMARY KEY CLUSTERED (N)

--===== Ok, let other folks use it

PRINT 'Applying "permissions"...'

GRANT SELECT ON dbo.Tally TO PUBLIC

PRINT 'Done.'

GO

Here's some other searches to do on this forum and on Google...

Aux Numbers

Auxiliary Numbers

Auxiliary Table

Numbers Table

Date Table

Calendar Table

There are dozens of uses for a Tally (or Numbers) table and a Calendar table and they all help you avoid cursors and loops (although some loops are faster than a join to these tables but most are not).

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

RyanRandall SSChampion Points: 13623 More actions · Answer 6

DogBot - You can read about normalisation here...

http://www.datamodel.org/NormalizationRules.html

Ryan Randall

Solutions are easy. Understanding the problem, now, that's the hard part.

stax68 SSChampion Points: 11761 More actions · Answer 7

I think the confusion arises here:

>Then, all you need to do is use this stored procedure to get your data... it won't be in the form you asked for but it's more correct...

The proc FindMatchesN uses the normalised schema that Jeff recommends. Therefore it selects the individual search term, producing a recordset wherein each ID has as many records as it has matches. That can be altered to build the list for each ID, but not straightforwardly. This is the converse of the string-splitting task, so similar methods would apply.

In case you can't change your schema in the short term, here's some code which demonstrates how to implement a temporary solution to the matching problem. An indexing strategy is suggested, but of course may need to be modified according your particular circumstances.

Just to preempt any responses from Outraged of Somewhere-off-the-M4, I'm not disputing any of the points made in the other posts, and this is only intended as a short-term fix!

set
go
create
create
declare
declare
declare
select @chrs = 'f,m,a,g,b,k,c,h,d,e,l,i,j'
,
insert
insert
select
while
begin

select @i, c.id, s.svcs + ',' + c.chr
from @svcs s join @chr c
on charindex(c.chr,s.svcs) = 0
and c.id > s.svcsid
where s.rank = @i-1
select @i = @i + 1
end
--populate target table
insert
select
select
from
--string arguments wrapped and placed in a table.
declare
insert
--query
select
--note the covering index on the calc column is used even though that col is not explicitly referenced.
select
from
on ',' + d.svcs + ',' like s.sarg
group
order
--
select
drop

Tim Wilkinson

"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant

RyanRandall SSChampion Points: 13623 More actions · Answer 8

And here's another way to do what you shouldn't...

--data

declare @t table (ID int, SVCS varchar(20))

insert @t

select 1, 'a,c,e,g'

union all select 2, 'b,d,f,h'

union all select 3, 'a,b,c'

union all select 4, 'f,g,h'

union all select 5, 'a,b,c,e'

--input

declare @s-2 varchar(20)

set @s-2 = 'a,b,c,e'

--calculation

declare @numbers table (i int identity(1, 1), x bit)

insert @numbers select top 20 null from master.dbo.syscolumns

select ID, SVCS, count(*) as count

from @numbers n inner join @t t on

',' + SVCS + ',' like '%,' + left(substring(@s, i, 20), charindex(',', substring(@s, i, 20) + ',')-1) + ',%'

where substring(',' + @s-2, i, 1) = ','

group by ID, SVCS

order by count desc

/*results

ID SVCS count

----------- -------------------- -----------

5 a,b,c,e 4

1 a,c,e,g 3

3 a,b,c 3

2 b,d,f,h 1

*/

Ryan Randall

Solutions are easy. Understanding the problem, now, that's the hard part.

DogBot SSC Journeyman Points: 89 More actions · Answer 9

OK, As you all know I am a SQL Newbie so my head is just spinning with all of the information, there is so much here that is COMPLETELY NEW AND FOREING to me...it's dizzing. I did embark on a serious normalization of this table (Me thinks I know what normalization means now)I had to create 12 new tables as there where many fields were data was stored as comma delimited strings. The SQL has grown tremendously complicated specially since most of what I am doing is cutting and pasting and adapting from your wonderful examples without really understanding what it is that I am doing exactly. By trail and error I am getting the right results though, but if anyone asks me ever to explain what I have done I could not.

Of course the kicker is that Ryan comes in and suggests I could have cheated a little not have had to do all of the normalization! 😉

Anyhow. Thanks to all of you, for your tremendous patience and selfless efforts in helping me(specially to Jeff)

DOG

Jeff Moden SSC Guru Points: 1004683 More actions · Answer 10

Thanks dee-oh-gee... little hint... document as you go and you'll never get behind.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)