Searching for n-categories

  • Hi there

    I have a challenge where I need some help. May someone can help me?

    I have a table where a project can have n-different categories. That means that a project (referenced by ID) can have zero or multiple categories.

    I am looking for a sql statement to find every project id which contains a specific combination of category. For example I'm looking for every project which contains the category 3 AND 6. As a result I should get the following result (based on the example below):

    Project

    2

    3

    The statement should by dynamic in the sens of looking for one or n-categories.

    Below you see an example of the table:

    -- Create table

    USE [test]

    GO

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    CREATE TABLE [dbo].[Project_Category](

    [Project] [int] NOT NULL,

    [Category] [int] NOT NULL,

    CONSTRAINT [PK_Project_Category] PRIMARY KEY CLUSTERED

    (

    [Project] ASC,

    [Category] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    --Insert values

    GO

    INSERT [dbo].[Project_Category] ([Project], [Category]) VALUES (1, 2)

    GO

    INSERT [dbo].[Project_Category] ([Project], [Category]) VALUES (1, 3)

    GO

    INSERT [dbo].[Project_Category] ([Project], [Category]) VALUES (1, 7)

    GO

    INSERT [dbo].[Project_Category] ([Project], [Category]) VALUES (2, 3)

    GO

    INSERT [dbo].[Project_Category] ([Project], [Category]) VALUES (2, 6)

    GO

    INSERT [dbo].[Project_Category] ([Project], [Category]) VALUES (2, 7)

    GO

    INSERT [dbo].[Project_Category] ([Project], [Category]) VALUES (3, 3)

    GO

    INSERT [dbo].[Project_Category] ([Project], [Category]) VALUES (3, 6)

    GO

    INSERT [dbo].[Project_Category] ([Project], [Category]) VALUES (4, 1)

    GO

    Thanks a lot for you help!

    Regards, Thomas

  • WITH CategoriesWanted([Category]) AS (

    SELECT [Category]

    FROM ( VALUES (3),(6)) x([Category])

    )

    SELECT [Project]

    FROM [dbo].[Project_Category]

    WHERE [Category] IN (SELECT [Category] FROM CategoriesWanted)

    GROUP BY [Project]

    HAVING COUNT(DISTINCT [Category])=(SELECT COUNT(*) FROM CategoriesWanted);

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Great, works perfect. That was really what I'm looking for!

    Thanks for your support.

    Regards, Thomas

  • Mark-101232 (9/30/2013)


    WITH CategoriesWanted([Category]) AS (

    SELECT [Category]

    FROM ( VALUES (3),(6)) x([Category])

    )

    SELECT [Project]

    FROM [dbo].[Project_Category]

    WHERE [Category] IN (SELECT [Category] FROM CategoriesWanted)

    GROUP BY [Project]

    HAVING COUNT(DISTINCT [Category])=(SELECT COUNT(*) FROM CategoriesWanted);

    Mark,

    I don't think you need to COUNT DISTINCT Category within Project as Project, Category is the primary key (IGNORE_DUP_KEY = OFF).


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • dwain.c (9/30/2013)


    Mark-101232 (9/30/2013)


    WITH CategoriesWanted([Category]) AS (

    SELECT [Category]

    FROM ( VALUES (3),(6)) x([Category])

    )

    SELECT [Project]

    FROM [dbo].[Project_Category]

    WHERE [Category] IN (SELECT [Category] FROM CategoriesWanted)

    GROUP BY [Project]

    HAVING COUNT(DISTINCT [Category])=(SELECT COUNT(*) FROM CategoriesWanted);

    Mark,

    I don't think you need to COUNT DISTINCT Category within Project as Project, Category is the primary key (IGNORE_DUP_KEY = OFF).

    Yep, you're right. Nice one.

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Mark-101232 (10/1/2013)


    dwain.c (9/30/2013)


    Mark-101232 (9/30/2013)


    WITH CategoriesWanted([Category]) AS (

    SELECT [Category]

    FROM ( VALUES (3),(6)) x([Category])

    )

    SELECT [Project]

    FROM [dbo].[Project_Category]

    WHERE [Category] IN (SELECT [Category] FROM CategoriesWanted)

    GROUP BY [Project]

    HAVING COUNT(DISTINCT [Category])=(SELECT COUNT(*) FROM CategoriesWanted);

    Mark,

    I don't think you need to COUNT DISTINCT Category within Project as Project, Category is the primary key (IGNORE_DUP_KEY = OFF).

    Yep, you're right. Nice one.

    And sorry for not saying initially that otherwise I thought the query you suggested was an excellent solution for the problem the OP posted.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Similar -

    WITH CTE AS (

    SELECT Project

    FROM Project_Category

    WHERE Category = '3')

    SELECT c.Project

    FROM CTE c JOIN Project_Category p ON p.Project = c.Project AND p.Category = '6'

  • This problem is known as Relational Divsion[/url].

    There is the exact division (only 3 and 6) and the one with a remainder (at least 3 and 6). Yours seems to be the second one.

    DECLARE @C TABLE (

    [Category] int NOT NULL PRIMARY KEY

    );

    INSERT INTO @C ([Category]) VALUES (3), (6);

    SELECT

    PC.Project

    FROM

    [dbo].[Project_Category] AS PC

    INNER JOIN

    @C AS C

    ON PC.Category = C.Category

    GROUP BY

    PC.Project

    HAVING

    COUNT(DISTINCT PC.Category) = (SELECT COUNT(*) FROM @C);

    GO

    Notice that I am constraining the categories I am looking for to be unique (primary key in this case).

    This is not the only way of solving relational division neither the one with best performance.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply