How to order records

  • I have a database with two fields: ID and SVCS.

    SVCS contains a comma delimited list.

    So records on the dabase might look like this:

    ID - SVCS

    1 - a,c,e,g

    2 - b,d,f,h

    3 - a,b,c

    4 - f,g,h

    5 - a,b,c,e

    To search this DB a user checks a series of checkboxes on a form, for example:

    SVCS: a[x] b[x] c[x] d[ ] e[x] f[ ] g[ ] h[ ]

    I want to pull any records where any of the items selected by the user is found in the SVCS field.

    In this example it would return:

    1 - a,c,e,g (3 matches: a,c,and e)

    2 - b,d,f,h (1 match: b)

    3 - a,b,c (2 matches: a,c)

    5 - a,b,c,e (4 matches: a,b,c,and e)

    Furthrermore:

    How do I order the results by the number of matches?

  • Really, really bad idea to have the table in the format you do... highly denormalized and for more reasons than I can count, you will have more problems with it than you can count.  Folks like Joe Celko would have a field day with this and I wouldn't blame them a bit.

    As an additional note, you might want to name the ID column as something a bit more meaningful, as well.  Yep, I know... this is just an example and it might already be named something else... just wanted to remind you.

    Here's how to fix the normalization of the table into a new table... I use temp tables for this example... you'd need to do a little work to make it fit your production tables...

    --===== If the temporary demonstration tables exist, drop them

         IF OBJECT_ID('TempDB..#UsersTable') IS NOT NULL

            DROP TABLE #UsersTable

         IF OBJECT_ID('TempDB..#NewTable') IS NOT NULL

            DROP TABLE #NewTable

    GO

    --===== Simulate the users table

     CREATE TABLE #UsersTable

            (

            ID INT,

            Svcs VARCHAR(20)

            )

    --===== Populate the sample users table with the data given

     INSERT INTO #UsersTable

            (ID,Svcs)

     SELECT 1,'a,c,e,g' UNION ALL

     SELECT 2,'b,d,f,h' UNION ALL

     SELECT 3,'a,b,c'   UNION ALL

     SELECT 4,'f,g,h'   UNION ALL

     SELECT 5,'a,b,c,e'

    --===== Create a more normalized table

     CREATE TABLE #NewTable

            (

            ID  INT     NOT NULL,

            Svc CHAR(1) NOT NULL

            )

    --===== Demonstrate how to convert the highly denormalized users table

         -- into the normalized table.

     INSERT INTO #NewTable

            (ID,Svc)

     SELECT y.ID,

            LTRIM(SUBSTRING(y.Svcs, t.Number+1, CHARINDEX(',', y.Svcs, t.Number + 1) - t.Number - 1)) AS Svc

       FROM Master.dbo.spt_Values t

      INNER JOIN

            (SELECT ID,','+Svcs+',' AS Svcs FROM dbo.#UsersTable) y

         ON SUBSTRING(y.Svcs, t.Number, 1) = ','

        AND t.Number < LEN(y.Svcs)

    WHERE t.Name IS NULL

    --===== Add a composite primary key to the new table

         -- to ensure normalicy in the future.

      ALTER TABLE #NewTable

        ADD CONSTRAINT PK_NewTable_ID_Svc PRIMARY KEY CLUSTERED (ID,Svc)

    --===== Show the new table

     SELECT *

       FROM #NewTable

    Then, all you need to do is use this stored procedure to get your data... it won't be in the form you asked for but it's more correct...

     CREATE PROCEDURE FindMatchesN

    --===== Declare the input parameters

            @a CHAR(1) = NULL, --Nulls allow the parameters to be optional

            @b-2 CHAR(1) = NULL,

            @c CHAR(1) = NULL,

            @d CHAR(1) = NULL,

            @e CHAR(1) = NULL,

            @f CHAR(1) = NULL,

            @g CHAR(1) = NULL,

            @h CHAR(1) = NULL

         AS

    --===== Suppress the auto-display of rowcounts to ensure that

         -- only one record set is created.

        SET NOCOUNT ON

    --===== Return all the matches and counts

     SELECT nt.ID,nt.Svc,d.Matches

       FROM #NewTable nt,

            (--Derived table "d" finds the ID's we want

             SELECT ID, COUNT(*) AS Matches

               FROM #NewTable

              WHERE Svc IN (@a,@b,@c,@d,@e,@f,@g,@h)

              GROUP BY ID

            ) d --End derived table "d"

      WHERE nt.ID = d.ID

      ORDER BY d.Matches DESC ,nt.ID ASC, nt.Svc ASC

    GO

    You can run the proc from your app using something like this...

    EXEC FindMatchesN @a='a',@b='b',@c='c'

    In case you can't or won't change the table to something more normalized as I suggested, do everything I said so far except rewrite the proc thusly...

     CREATE PROCEDURE FindMatchesD

    --===== Declare the input parameters

            @a CHAR(1) = NULL, --Nulls allow the parameters to be optional

            @b-2 CHAR(1) = NULL,

            @c CHAR(1) = NULL,

            @d CHAR(1) = NULL,

            @e CHAR(1) = NULL,

            @f CHAR(1) = NULL,

            @g CHAR(1) = NULL,

            @h CHAR(1) = NULL

         AS

    --===== Suppress the auto-display of rowcounts to ensure that

         -- only one record set is created.

        SET NOCOUNT ON

    --===== Return all the matches and counts

     SELECT u.ID,u.Svcs,d.Matches

       FROM #UsersTable u,

            (--Derived table "d" finds the ID's we want

             SELECT ID, COUNT(*) AS Matches

               FROM #NewTable

              WHERE Svc IN (@a,@b,@c,@d,@e,@f,@g,@h)

              GROUP BY ID

            ) d --End derived table "d"

      WHERE u.ID = d.ID

      ORDER BY d.Matches DESC ,u.ID ASC

    GO

    ...and execute it nearly the same way (only the name changed)...

    EXEC FindMatchesD @a='a',@b='b',@c='c'

    Do notice that I used the undocumented spt_Values table to get my numbers to do the split of the CSV Svcs column... you really need to make your own "Tally" or "Numbers" table and, if you search for it, there's plenty of discussion on how to do that on this forum.  If you really don't understand what I mean on this, post back.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Wow! What a comprehensive reply. Nice work, Jeff

    You must either have far too much time on you hands or you are a really, really nice guy. I'm sure it's the latter. 

    Ryan Randall

    Solutions are easy. Understanding the problem, now, that's the hard part.

  • Jeff,

    Thank you for all of your help, it is truly an amazing amount of information! I am a SQL newbie but I was able to adapt your thourougly commented scripts and normalize my real table, and get the stored procedure to work against it.

    Going back to the sample data/tables when I run the stored procedure:

    EXEC FindMatchesN @a='e',@b='b',@c='c'

    I get:

    ID SVC Matches

    3 a 2

    3 b 2

    3 c 2

    5 a 2

    5 b 2

    5 c 2

    5 e 2

    1 a 1

    1 c 1

    1 e 1

    1 g 1

    Close, but it is probably missing a group by clause or maybe that is what you refer to when you say "you really need to make your own "Tally" or "Numbers" table".

    After all of the work you put into explaing this all to me I hate to ask for more but, could you explain this Tally" or "Numbers" table issue, or point me in the right direction?

    I've searched the forums for "Tally tables" and found nothing. Again thanks so much for all your help.

  • DogBot,

    Thanks for the feedback and sorry for my absence... been doing the OT thing alot, lately.

    When I run...

    EXEC FindMatchesN @a='a',@b='b',@c='c'

    ...against the data in my posting (which I was pretty sure matched your post) I get...

    ID          Svc  Matches    

    ----------- ---- -----------

    3           a    3

    3           b    3

    3           c    3

    5           a    3

    5           b    3

    5           c    3

    5           e    3

    1           a    2

    1           c    2

    1           e    2

    1           g    2

    2           b    1

    2           d    1

    2           f    1

    2           h    1

    ...which looks correct to me (or may be due to a lack of sleep)... can you explain what you mean by "Close, but is probably missing a Group By"?  Perhaps, post the code you are using?  Thanks.

    Ryan,

    Thanks for the very nice compliment... I neither have time on my hands nor am I much of a nice guy (just kidding)... I just remember when I got started and also remember not having forums like this or people like yourself to be able to turn to... so, when I see someone in really deep Kimchie, especially a "newbie", I try to go the extra mile... kinda like you and a lot of other good folks on this forum.

    Besides, it's a heck of a lot of fun watching the lights come on.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Dang... didn't read down far enough...

    DogBot... here's how to create a Tally table without using a cursor or loop...

    /****************************************************************************************

     THIS IS A TABLE CREATION SCRIPT

     Purpose:

     This script creates a "Tally" or "Numbers" table consisting of a single column with the

     numbers 1 through 9999 in it.  The table is used to create "set-based loops" for

     splitting CSV's as well as other high-speed functions and to provide other functionality

     such as creating ranges of dates (about 27 years worth).

    ***************************************************************************************/

    --===== Create and populate the Tally table on the fly

      PRINT 'Creating Tally table...'

     SELECT TOP 9999

            IDENTITY(INT,1,1) AS N

       INTO dbo.Tally

       FROM Master.dbo.SysComments sc1,

            Master.dbo.SysComments sc2

    --===== Add a Primary Key to maximize performance

         -- Fill factor = 100 on static tables because no changes will occur.

         -- Full fill adds even more performance.

      PRINT 'Adding primary key...'

      ALTER TABLE dbo.Tally

            ADD CONSTRAINT PK_Tally_N PRIMARY KEY CLUSTERED (N)

    --===== Ok, let other folks use it

      PRINT 'Applying "permissions"...'

      GRANT SELECT ON dbo.Tally TO PUBLIC

      PRINT 'Done.'

    GO

     

    Here's some other searches to do on this forum and on Google...

    Aux Numbers

    Auxiliary Numbers

    Auxiliary Numbers

    Auxiliary Table

    Numbers Table

    Date Table

    Calendar Table

    There are dozens of uses for a Tally (or Numbers) table and a Calendar table and they all help you avoid cursors and loops (although some loops are faster than a join to these tables but most are not).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • DogBot - You can read about normalisation here...

    http://www.datamodel.org/NormalizationRules.html

    Ryan Randall

    Solutions are easy. Understanding the problem, now, that's the hard part.

  • I think the confusion arises here:

    >Then, all you need to do is use this stored procedure to get your data... it won't be in the form you asked for but it's more correct...
     
    The proc FindMatchesN uses the normalised schema that Jeff recommends. Therefore it selects the individual search term, producing a recordset wherein each ID has as many records as it has matches. That can be altered to build the list for each ID, but not straightforwardly. This is the converse of the string-splitting task, so similar methods would apply.

    In case you can't change your schema in the short term, here's some code which demonstrates how to implement a temporary solution to the matching problem. An indexing strategy is suggested, but of course may need to be modified according your particular circumstances.

    Just to preempt any responses from Outraged of Somewhere-off-the-M4, I'm not disputing any of the points made in the other posts, and this is only intended as a short-term fix!

    set

    numeric_roundabort off

    go

    --generate data for testing

    create

    table tbl_denorm (ID int identity, svcs varchar(500), svcshash as ',' + svcs + ',' ) --, unique

    create

    clustered index ix_denorm on tbl_denorm(svcshash, ID)

    declare

    @chr table(id int identity, chr varchar(8000))

    declare

    @svcs table(rank int, svcsid int, svcs varchar(8000),length as len(svcs))

    declare

    @chrs varchar(8000), @chrcount int, @sql varchar(8000), @i int

    select @chrs = 'f,m,a,g,b,k,c,h,d,e,l,i,j'

    ,

    @sql = 'select ''' + replace(replace(@chrs, char(39), char(39)+char(39)),',',''' union all select ''') + ''''

    insert

    @chr(chr) exec( @sql)

    insert

    @svcs(rank, svcsid, svcs) select 1, id, chr from @chr

    select

    @chrcount = @@rowcount, @i = 2

    while

    @i <= @chrcount

    begin

    insert @svcs(rank,svcsid, svcs)

    select @i, c.id, s.svcs + ',' + c.chr
    from @svcs s join @chr c
    on charindex(c.chr,s.svcs) = 0
    and c.id > s.svcsid
    where s.rank = @i-1
    select @i = @i + 1

    end

    --populate target table

    insert

    tbl_denorm(svcs)

    select

    svcs from @svcs cross join (select 1 i union select 2 union select 3) i

    select

    cast(count(*) as varchar) + ' records: combinations including ' + min(svcs) + ' and '+ max(svcs)

    from

    tbl_denorm

    --string arguments wrapped and placed in a table.

    declare

    @sarg table(sarg varchar(50))

    insert

    @sarg(sarg) select '%,a,%' union select '%,d,%' union select '%,g,%' union select '%,m,%'

    --query

    select

    getdate() dt

    --note the covering index on the calc column is used even though that col is not explicitly referenced.

    select

    d.ID, d.svcs, count(*) rank

    from

    tbl_denorm d join @sarg s

    on ',' + d.svcs + ',' like s.sarg

    group

    by d.ID, d.svcs

    order

    by rank desc, d.svcs

    --

    select

    getdate() dt

    drop

    table tbl_denorm

    Tim Wilkinson

    "If it doesn't work in practice, you're using the wrong theory"
    - Immanuel Kant

  • And here's another way to do what you shouldn't...

    --data

    declare @t table (ID int, SVCS varchar(20))

    insert @t

              select 1, 'a,c,e,g'

    union all select 2, 'b,d,f,h'

    union all select 3, 'a,b,c'

    union all select 4, 'f,g,h'

    union all select 5, 'a,b,c,e'

    --input

    declare @s-2 varchar(20)

    set @s-2 = 'a,b,c,e'

    --calculation

    declare @numbers table (i int identity(1, 1), x bit)

    insert @numbers select top 20 null from master.dbo.syscolumns

    select ID, SVCS, count(*) as count

    from @numbers n inner join @t t on

        ',' + SVCS + ',' like '%,' + left(substring(@s, i, 20), charindex(',', substring(@s, i, 20) + ',')-1) + ',%'

    where substring(',' + @s-2, i, 1) = ','

    group by ID, SVCS

    order by count desc

    /*results

    ID          SVCS                 count      

    ----------- -------------------- -----------

    5           a,b,c,e              4

    1           a,c,e,g              3

    3           a,b,c                3

    2           b,d,f,h              1

    */

     

    Ryan Randall

    Solutions are easy. Understanding the problem, now, that's the hard part.

  • OK, As you all know I am a SQL Newbie so my head is just spinning with all of the information, there is so much here that is COMPLETELY NEW AND FOREING to me...it's dizzing. I did embark on a serious normalization of this table (Me thinks I know what normalization means now)I had to create 12 new tables as there where many fields were data was stored as comma delimited strings. The SQL has grown tremendously complicated specially since most of what I am doing is cutting and pasting and adapting from your wonderful examples without really understanding what it is that I am doing exactly. By trail and error I am getting the right results though, but if anyone asks me ever to explain what I have done I could not.

    Of course the kicker is that Ryan comes in and suggests I could have cheated a little not have had to do all of the normalization! 😉

    Anyhow. Thanks to all of you, for your tremendous patience and selfless efforts in helping me(specially to Jeff)

    DOG

  • Thanks dee-oh-gee... little hint... document as you go and you'll never get behind.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 11 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply