SQL Query - Query for various attributes problem ...

  • I would like opinions from those that are more experienced than I in dealing with the following problem:

    The system has a number of categories and each category has a varying number of attributes, for example:

    Category -> Attributes:

    C1 -> A11, A12, A13, A14, A15, A16, A17, A18, A19

    C2 -> A21, A22, A23, A24, A25

    C3 -> A31, A32, A33, A34, A35, A36, A37

    So far in the design, the fewest number of attributes associated with a category is 6, the largest is 14 and there are only 9 categories defined (so far).

    The system also requires the following tables:

    A table (call it T1) where each widget may have none, any or all of the attributes above (from any category) associated with it, for example:

    Widget -> Attributes:

    W1 -> A11, A12, A21, A22, A31, A32

    W2 -> A11, A12, A13, A14, A15, A16, A17, A18, A19, A21, A22, A23, A24, A25, A33

    W3 -> NULL

    A table (call it T2) in which users may save search criteria like the following examples:

    1.Find all widgets with the attributes A11 or A12 present.

    2.Find all widgets in category C1 with the attributes A11 and A19 present AND in category C2 with attributes A22 or A23 present AND in category C3 with attributes A33 or A34.

    Users should be able to “store” their search criteria so the system will find any widgets that have any combination of attributes in any category, ie: A SQL query that can use the information in T2 to extract the appropriate result set from T1. This is done because the system sends out notifications at specific intervals when widgets are found that match a users saved search criteria.

    The table T2 would need to store the following elements for each users search criteria:

    - The category id

    - The match type for attributes in the category (match any or match all)

    - The attributes to be matched

    Taking example 2 above, a user might store the following search criteria:

    - Category: C1, MatchType: All, Attributes: A11, A19

    - Category: C2, MatchType: Any, Attributes A22, A23

    - Category: C3, MatchType: Any, Attributes A33, A34

    The above search criteria would return widget W2 as it contains ( all of the attributes A11, A19 ) AND (any of the attributes A22, A23) AND ( any of the attributes A33, a34 ).

    I am interested in knowing how the SQL experts would tackle such a problem, specifically:

    1.The structure of the various tables, ie: categories, category_attributes, T1 and T2.

    2.The SQL queries used on T1 and T2 that could produce the result sets shown in the examples above.

    I’ve worked out a solution using bitmasks to encode the various category attributes, as bitmasks work extremely well in handling problems such as finding all or any attributes that might be present for various categories, however – I know several experts (Joe Celko – which by the way, your book SQL For Smarties I use all the time), would cringe at the very notion of using bitmasks in a SQL design – hence, my plea for better ideas for this problem.

    Any and all ideas are welcome - thanks!

  • EXCELLENT POST and VERY interesting subject! Can you post the table and bitmask-definition for your proposed bitmask solution for the examples you have in your original posts?

    Bitmasks are very tempting because they're excellent in handling the 3 key queries for such things, those being 1) find any, 2) find all, and 3) find only (the 3rd of which you didn't include in your original post). Bitmasks also violate what most people would consider to be proper "normalization". Celko would, indeed, have a conniption over this bitmask method. Still, traditional normalized methods have a more difficult time with the "find all" and "find only" when the number of attributes is indeterminate in design and isn't known ahead of time in queries unless some form of dynamic SQL is used or lengthy conditional WHERE clauses come into play. One might even suggest that delimited storage of the attributes in a single column would be appropriate. Of course, most folks will have an absolute tirade over that.

    It'll be interesting to see what other people's responses are to your post. I imagine there will be quite some zeal in some of the responses. I think I'll make a bowl of popcorn and wait for the fireworks on this one. This should be really interesting. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff,

    Yes -- as a programmer, it was natural to think of a solution using bitmasks; and it actually works fairly well on SQL server -- however, I am open to any "pure" SQL solutions that would not need to use bitmasks.

    Also, I did leave out the "find only" option as it was not required in this particular problem.

    Cheers!

  • bdaviduck (3/6/2011)


    Yes -- as a programmer, it was natural to think of a solution using bitmasks; and it actually works fairly well on SQL server -- however, I am open to any "pure" SQL solutions that would not need to use bitmasks.

    All I can say to that is "me too". 😉

    I'm really a bit surprised that no one else has jumped into this particular thread, yet.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hello,

    This post looks a bit stale, but I'm intrigued. Here's what I observed, but maybe this is just an artifact of your sample:

    The categories each have unique attributes associated with them. No attribute is found in more than one category. If this is generally true, then the problem greatly simplifies to a matching between a widget and various attributes. The categories appear to be only there as "labels" that group the attributes, and are not necessary for the search and filtering process.

    Is this true? If it is, then the searching/filtering can be done on a two-column table of Widgets and Attributes. Dynamic sql could accept the user's csv input of selected attributes (tiny or small ints, for instance) and put those into "IN" and "UNION" clauses for the "must have" and "need one or the other of these" values. Load a temp table using by filtering results from the WidgetAttributes table; delete from the temp table if there are "must not have" values. This yields the widgets that meet the input criteria.

    Cheers,

    Elliott

  • I've used bitmasks in a similar method before, but it was to shortcircuit bad full-text indexing in 2k and prior. It's very useful, and allows you a reasonable volume of columns as well as a built-in organizer for categories.

    I adore bitmasks in general. It also drives my fellow sql'ers nuts sometimes because a few of them 'just don't get it'.

    That said, a few things to consider.

    1) As mentioned, some people "Just don't get it". Be aware if these folks are on your team you're going to have more 3AM phone calls then usual.

    2) You have a maximum of 64 attributes per category mask (BigInt).

    3) You must keep a library of your attribute/category combinations. Because you have this library, people are going to assume changing something in the library is actually useful, so you'll probably need to tie it into your code.

    4) Multiple states for your category attributes are illegal unless you're planning to try to use multi-bit definitions. That level of complexity can be painful.

    So, what's a dev to do?

    One option is to create the OTLT (One True Lookup Table) from heck. To do this, you would generate a table (basics here) that looks something like this:

    CREATE TABLE AttributeAssignment (ItemID INT, AttributeID INT, AttValue TINYINT CONSTRAINT cnst_AttValue DEFAULT (0))

    CREATE CLUSTERED INDEX idx_aa ON AttributeAssignment ( AttributeID)

    Make sure AttValue only contains 0s and 1s by your favorite method. I usually use a constraint.

    Use IDs here because you need this to act like a fact table, thin and long, as skinny as possible.

    For the first search, you'd have to know the attribute they want, for example, A12... which let's randomly say attaches to AttributeID 23 (You'd have your categoryID and whatnot in a secondary lookup table. This table gets long, FAST).

    SELECT

    ItemID, AttValue

    FROM

    AttributeAssignment

    WHERE

    AttributeID = 23

    AND AttValue = 1

    Well, that's simple enough... but more involved queries? They involve a dynamic build of a unioning case of doom. Check this out:

    SELECT

    ItemID

    INTO

    #FoundIDs

    FROM

    -- AND structure

    -- Remove the JOIN to drvOR component if no ANDs are found.

    -- Dynamic1 begin

    (SELECTItemID,AttValue

    FROMAttributeAssignment

    WHEREAttributeID = 23

    UNION ALL

    SELECTItemID,AttValue

    FROMAttributeAssignment

    WHEREAttributeID = 81

    UNION ALL

    SELECTItemID,AttValue

    FROMAttributeAssignment

    WHEREAttributeID = 68

    )

    -- Dynamic1 End

    AS drvAND

    -- OR STRUCTURE --If it passes the or, then check the ANDs.

    JOIN

    (SELECT

    ItemID

    FROM

    --Dynamic2 Begin

    (SELECTItemID,AttValue

    FROMAttributeAssignment

    WHEREAttributeID = 302

    UNION ALL

    SELECTItemID,AttValue

    FROMAttributeAssignment

    WHEREAttributeID = 617

    UNION ALL

    SELECTItemID,AttValue

    FROMAttributeAssignment

    WHEREAttributeID = 419

    --Dynamic2 End

    ) AS drvMid

    GROUP BY

    ItemID

    HAVING

    SUM( AttValue) > 0

    ) AS drvOR

    ONdrvAND.ItemID = drvOR.ItemID

    GROUP BY

    drvAND.ItemID

    HAVING

    SUM( drvAND.AttValue) = 3 --# of ANDs located

    Then just join up #FoundIDs to your real data and display what you like.

    The problem with this is it has to be built dynamically to keep from shooting yourself in the foot, especially with a thousand options available to the end user. You also have to take apart their statement of ands and ors (which you have to do with either method, really) to determine which goes where.

    If you've got and/or combinations (ie: (A or B) AND (C or D) ) you chain in another drvOR JOIN for each grouping.

    Notice... this is a lot of work to get around a lack of training and a 64 value restriction. I'd still lean towards bitmasks, but sometimes they just can't handle it and you've got to go after something at this level.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply