Query logic help please

  • Jeff Moden (10/21/2014)


    Eirikur Eiriksson (10/20/2014)


    Some code refactoring on the previous set based solution to fulfil all the requirements, several orders of magnitude faster than string manipulation although it has not been tested on very large set.

    😎

    Careful now. If you have another CategoryID that does meet the requirements, it won't show.

    Good point! To cater for identical Category definitions, the subquery output will not work. Here is an example of two identical categories, 3 and 13.

    😎

    USE tempdb;

    GO

    DECLARE @t TABLE (Id INT)

    INSERT INTO @t

    SELECT 101 UNION ALL

    SELECT 102 UNION ALL

    SELECT 103 UNION ALL

    SELECT 108;

    DECLARE @t2 TABLE (CategoryId INT, Id INT)

    INSERT INTO @t2

    SELECT 2, 50 UNION ALL

    SELECT 2, 51 UNION ALL

    SELECT 2, 52 UNION ALL

    SELECT 2, 59 UNION ALL

    SELECT 2, 60 UNION ALL

    SELECT 3, 101 UNION ALL

    SELECT 3, 102 UNION ALL

    SELECT 3, 103 UNION ALL

    SELECT 3, 108 UNION ALL

    SELECT 13, 101 UNION ALL

    SELECT 13, 102 UNION ALL

    SELECT 13, 103 UNION ALL

    SELECT 13, 108 UNION ALL

    SELECT 9, 108 UNION ALL

    SELECT 4, 108 UNION ALL

    SELECT 4, 109 UNION ALL

    SELECT 4, 120 UNION ALL

    SELECT 9, 125 UNION ALL

    SELECT 9, 166 UNION ALL

    SELECT 9, 169;

    ;WITH CATEGORY_CANDIDATES AS

    (

    SELECT DISTINCT

    T2.CategoryId

    FROM @t2 T2

    WHERE T2.Id IN (SELECT Id FROM @T)

    )

    ,BASE_SET AS

    (

    SELECT

    T2.CategoryId

    ,T2.Id

    ,COUNT(T2.Id) OVER

    (

    PARTITION BY T2.CategoryId

    ) AS T2_COUNT

    FROM @t2 T2

    INNER JOIN CATEGORY_CANDIDATES CC

    ON T2.CategoryId = CC.CategoryId

    )

    ,INCOMING_SET AS

    (

    SELECT

    T.Id

    ,COUNT(T.Id) OVER

    (

    PARTITION BY (SELECT NULL)

    ) AS T_COUNT

    FROM @t T

    )

    ,MATCHING_SET AS

    (

    SELECT

    BS.CategoryId

    ,COUNT(*) OVER

    (

    PARTITION BY BS.CategoryId

    ) AS T2M_COUNT

    ,T_COUNT

    ,T2_COUNT

    FROM INCOMING_SET INS

    INNER JOIN BASE_SET BS

    ON INS.Id = BS.Id

    )

    SELECT DISTINCT

    MS.CategoryId

    FROM MATCHING_SET MS

    WHERE MS.T_COUNT = MS.T2_COUNT

    AND MS.T_COUNT = MS.T2M_COUNT;

    Results

    CategoryId

    3

    13

  • CELKO (10/21/2014)


    You need to learn basic RDBMS. All of SQL is based on scalar values and 1NF. We never split strings in good SQL! That is done in the presentation layers. Then you do not know the ANSI/ISO Standard syntax!

    There is no such crap as a “category_id” in RDBMS. You can have a “<something>_category” or a “<something>_id”, but not a list of attribute properties! Tables have to have keys by definition.

    CREATE TABLE Foobars (foo_id INTEGER NOT NULL PRIMARY KEY);

    INSERT INTO Foobars

    VALUES (101), (102), (103), (108);

    CREATE TABLE Floobs

    (stupidity_level INTEGER NOT NULL,

    foo_id INTEGER NOT NULL PRIMARY KEY

    REFERENCES Foobars(foo_id));

    INSERT INTO Floobs

    VALUES

    (2, 50),

    (2, 51),

    (2, 52),

    (2, 59),

    (2, 60),

    (3, 101),

    (3, 102),

    (3, 103),

    (3, 108),

    (4, 109),

    (4, 120),

    (9, 125),

    (9, 166),

    (9, 169);

    >> I need to return the stupidity_level from Floobs only if each foo_id in the comma separated string has a matching value in Floobs (and if there is one and only one stupidity_level for the foo_ids passed), so in this case, 3 should be returned.<<

    Relational division is one of the eight basic operations in Codd's relational algebra. The idea is that a divisor table is used to partition a dividend table and produce a quotient or results table. The quotient table is made up of those values of one column for which a second column had all of the values in the divisor. There is a really good presentation on four ways to do this at: http://www.cs.arizona.edu/people/mccann/research/divpresentation.pdf

    This is easier to explain with an example. We have a table of pilots and the planes they can fly (dividend); we have a table of planes in the hangar (divisor); we want the names of the pilots who can fly every plane (quotient) in the hangar. To get this result, we divide the PilotSkills table by the planes in the hangar.

    CREATE TABLE PilotSkills

    (pilot CHAR(15) NOT NULL,

    plane CHAR(15) NOT NULL,

    PRIMARY KEY (pilot, plane));

    PilotSkills

    pilot plane

    =========================

    'Celko' 'Piper Cub'

    'Higgins' 'B-52 Bomber'

    'Higgins' 'F-14 Fighter'

    'Higgins' 'Piper Cub'

    'Jones' 'B-52 Bomber'

    'Jones' 'F-14 Fighter'

    'Smith' 'B-1 Bomber'

    'Smith' 'B-52 Bomber'

    'Smith' 'F-14 Fighter'

    'Wilson' 'B-1 Bomber'

    'Wilson' 'B-52 Bomber'

    'Wilson' 'F-14 Fighter'

    'Wilson' 'F-17 Fighter'

    CREATE TABLE Hangar

    (plane CHAR(15) NOT NULL PRIMARY KEY);

    Hangar

    plane

    =============

    'B-1 Bomber'

    'B-52 Bomber'

    'F-14 Fighter'

    PilotSkills DIVIDED BY Hangar

    pilot

    =============================

    'Smith'

    'Wilson'

    In this example, Smith and Wilson are the two pilots who can fly everything in the hangar. Notice that Higgins and Celko know how to fly a Piper Cub, but we don't have one right now. In Codd's original definition of relational division, having more rows than are called for is not a problem.

    The important characteristic of a relational division is that the CROSS JOIN (Cartesian product) of the divisor and the quotient produces a valid subset of rows from the dividend. This is where the name comes from, since the CROSS JOIN acts like a multiplication operator.

    Division with a Remainder

    There are two kinds of relational division. Division with a remainder allows the dividend table to have more values than the divisor, which was Codd's original definition. For example, if a pilot can fly more planes than just those we have in the hangar, this is fine with us. The query can be written in SQL-89 as

    SELECT DISTINCT pilot

    FROM PilotSkills AS PS1

    WHERE NOT EXISTS

    (SELECT *

    FROM Hangar

    WHERE NOT EXISTS

    (SELECT *

    FROM PilotSkills AS PS2

    WHERE (PS1.pilot = PS2.pilot)

    AND (PS2.plane = Hangar.plane)));

    The quickest way to explain what is happening in this query is to imagine an old World War II movie where a cocky pilot has just walked into the hangar, looked over the fleet, and announced, "There ain't no plane in this hangar that I can't fly!" We are finding the pilots for whom there does not exist a plane in the hangar for which they have no skills. The use of the NOT EXISTS() predicates is for speed. Most SQL systems will look up a value in an index rather than scan the whole table. The SELECT * clause lets the query optimizer choose the column to use when looking for the index.

    This query for relational division was made popular by Chris Date in his textbooks, but it is not the only method nor always the fastest. Another version of the division can be written so as to avoid three levels of nesting. While it is not original with me, I have made it popular in my books.

    SELECT PS1.pilot

    FROM PilotSkills AS PS1, Hangar AS H1

    WHERE PS1.plane = H1.plane

    GROUP BY PS1.pilot

    HAVING COUNT(PS1.plane) = (SELECT COUNT(plane) FROM Hangar);

    There is a serious difference in the two methods. Burn down the hangar, so that the divisor is empty. Because of the NOT EXISTS() predicates in Date's query, all pilots are returned from a division by an empty set. Because of the COUNT() functions in my query, no pilots are returned from a division by an empty set.

    In the sixth edition of his book, INTRODUCTION TO DATABASE SYSTEMS (Addison-Wesley; 1995 ;ISBN 0-201-82458-2), Chris Date defined another operator (DIVIDEBY ... PER) which produces the same results as my query, but with more complexity.

    Exact Division

    The second kind of relational division is exact relational division. The dividend table must match exactly to the values of the divisor without any extra values.

    SELECT PS1.pilot

    FROM PilotSkills AS PS1

    LEFT OUTER JOIN

    Hangar AS H1

    ON PS1.plane = H1.plane

    GROUP BY PS1.pilot

    HAVING COUNT(PS1.plane) = (SELECT COUNT(plane) FROM Hangar)

    AND COUNT(H1.plane) = (SELECT COUNT(plane) FROM Hangar);

    This says that a pilot must have the same number of certificates as there planes in the hangar and these certificates all match to a plane in the hangar, not something else. The "something else" is shown by a created NULL from the LEFT OUTER JOIN.

    Please do not make the mistake of trying to reduce the HAVING clause with a little algebra to:

    HAVING COUNT(PS1.plane) = COUNT(H1.plane)

    because it does not work; it will tell you that the hangar has (n) planes in it and the pilot is certified for (n) planes, but not that those two sets of planes are equal to each other.

    Do you actually have any code that would solve the problem using your methods that the OP was given? 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi, there was a mistake.

    HAVING COUNT(T.Id) = @NoOfIds => HAVING COUNT(T2.Id) = @NoOfIds

Viewing 3 posts - 16 through 17 (of 17 total)

You must be logged in to reply to this topic. Login to reply