Need some help with some SQL or table design

  • Hi,

    I've been trying to work out the best approach to retrieve some data from some tables. Its basically based on some simple rules, but the rules are dynamic (saved in a table, but subject to change at the users whim) which is causing me some trouble.

    I've simplified the design for my example, it has the same concept though.

    So, my scenario would be that a user would create a number of tier's - then they'd enter the criteria for each tier into the rule table, then they would enroll some participants and depending on the participants information, it should automatically assign them to the most appropriate tier. To do that, it uses the tierrule table which stores table name, field name, operand and value. I need to evaluate these rules that the user has created against the enrolled participant. If there are no matches, thats a valid result as well - it just means I have to make them manually choose a tier.

    So lets say we have 3 tables that can be evaluated by the rules: person, personval, demoval - they are all linked by personid (and it can be assumed that ALL tables that are evaluated will have personid, which will be provided when looking for the most appropriate tier)

    If a person is enrolled with person.age > 5 and < 10 and personval.val1 > 10 and demoval.demo_val1 = 1 and demoval.demo_val2 = 1, then they should automatically be assigned to tier 1

    If they have person.age > 10 and < 20 and personval.pv_val = 5, they should be assigned to tier 2

    if they have person.age > 20 and demoval.demo_val1 = 5 and personval.pv_val1 = 5, then they should be assigned to tier 3

    Obviously this is a very simplistic example, but I could have a whole load of tiers with 20+ rules on each one (hitting multiple tables)

    I know I can do this via application code - Just hit the tierrule table, create a SELECT from there and run multiple SELECTS (one for each rule). In fact I did just that via SQL as well, but I can't help but think there is a better solution.

    Heres what I have been using

    SELECT 1 AS seq, 'SELECT tierid FROM (' AS my_sql UNION

    SELECT 2 AS seq, 'SELECT ' + CAST(tierid AS VARCHAR(10)) + ' AS tierid, CASE WHEN COUNT(*) = 1 THEN 0 ELSE -1 END AS match FROM ' + tblname + ' WHERE ' + fieldname + ' ' + operand + ' ''' + CAST(ruleval AS VARCHAR(255)) + ''' AND personid = 1 UNION ALL' FROM tierrule UNION

    SELECT 3 AS seq, 'SELECT -1, ''''' UNION

    SELECT 4 AS seq, ') a WHERE tierid <> -1 GROUP BY tierid HAVING SUM(match) = 0' ORDER BY seq

    Which gives me this...

    SELECT

    tierid

    FROM

    (

    SELECT 1 AS tierid, CASE WHEN COUNT(*) = 1 THEN 0 ELSE -1 END AS match FROM person WHERE age > 5 AND personid = '1' UNION ALL

    SELECT 1 AS tierid, CASE WHEN COUNT(*) = 1 THEN 0 ELSE -1 END AS match FROM person WHERE age < 10 AND personid = '1' UNION ALL

    SELECT 1 AS tierid, CASE WHEN COUNT(*) = 1 THEN 0 ELSE -1 END AS match FROM personval WHERE pv_val1 > 10 AND personid = '1' UNION ALL

    SELECT 1 AS tierid, CASE WHEN COUNT(*) = 1 THEN 0 ELSE -1 END AS match FROM demoval WHERE demo_val1 = 1 AND personid = '1' UNION ALL

    SELECT 1 AS tierid, CASE WHEN COUNT(*) = 1 THEN 0 ELSE -1 END AS match FROM demoval WHERE demo_val2 = 1 AND personid = '1' UNION ALL

    SELECT 2 AS tierid, CASE WHEN COUNT(*) = 1 THEN 0 ELSE -1 END AS match FROM person WHERE age > 10 AND personid = '1' UNION ALL

    SELECT 2 AS tierid, CASE WHEN COUNT(*) = 1 THEN 0 ELSE -1 END AS match FROM person WHERE age < 10 AND personid = '1' UNION ALL

    SELECT 2 AS tierid, CASE WHEN COUNT(*) = 1 THEN 0 ELSE -1 END AS match FROM personval WHERE pv_val1 = 1 AND personid = '1' UNION ALL

    SELECT 3 AS tierid, CASE WHEN COUNT(*) = 1 THEN 0 ELSE -1 END AS match FROM person WHERE age > 20 AND personid = '1' UNION ALL

    SELECT 3 AS tierid, CASE WHEN COUNT(*) = 1 THEN 0 ELSE -1 END AS match FROM demoval WHERE demo_val1 = 5 AND personid = '1' UNION ALL

    SELECT 3 AS tierid, CASE WHEN COUNT(*) = 1 THEN 0 ELSE -1 END AS match FROM personval WHERE pv_val1 = 5 AND personid = '1'

    ) a

    GROUP BY tierid

    HAVING SUM(match) = 0

    And when its run, it returns tierid = 1 which is correct.

    But as I say - its clunky and surely theres a better approach? It should be noted that I have some latitude to change the table structure, but I'd rather not have a rule table for each and every table I might evaluate.

    Thanks in advance - full table info below

    CREATE TABLE [dbo].[tier](

    [tierid] [int] IDENTITY(1,1) NOT NULL,[tiername] [varchar](50) NULL,

    CONSTRAINT [PK_tier] PRIMARY KEY CLUSTERED

    (

    [tierid] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    CREATE TABLE [dbo].[tierrule](

    [tierruleid] [int] IDENTITY(1,1) NOT NULL, [tierid] [int] NOT NULL,[tblname] [varchar](50) NULL,[fieldname] [varchar](50) NULL,[operand] [varchar](5) NULL,[ruleval] [varchar](50) NULL,

    CONSTRAINT [PK_tierrule] PRIMARY KEY CLUSTERED

    (

    [tierruleid] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    CREATE TABLE [dbo].[person](

    [personid] [int] IDENTITY(1,1) NOT NULL,[age] [int] NULL,[val1] [int] NULL,[val2] [varchar](50) NULL,[val3] [varchar](50) NULL,

    CONSTRAINT [PK_person] PRIMARY KEY CLUSTERED

    (

    [personid] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    CREATE TABLE [dbo].[personval](

    [personvalid] [int] IDENTITY(1,1) NOT NULL,[personid] [int] NOT NULL,[pv_val1] [int] NULL,[pv_val2] [int] NULL,[pv_val3] [varchar](50) NULL,

    CONSTRAINT [PK_personval] PRIMARY KEY CLUSTERED

    (

    [personvalid] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    CREATE TABLE [dbo].[demoval](

    [persondemoid] [int] IDENTITY(1,1) NOT NULL, [personid] [int] NOT NULL,[demo_val1] [int] NULL,[demo_val2] [int] NULL,[demo_val3] [varchar](50) NULL,[demo_val4] [varchar](50) NULL,

    CONSTRAINT [PK_demoval] PRIMARY KEY CLUSTERED

    (

    [persondemoid] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    INSERT INTO person (age, val1, val2, val3) VALUES (7, 20, '1', '1')

    INSERT INTO personval (personid, pv_val1, pv_val2, pv_val3) VALUES (1, 15, 1, '1')

    INSERT INTO demoval (personid, demo_val1, demo_val2, demo_val3, demo_val4) VALUES (1, 1, 1, '1', '1')

    INSERT INTO tier (tiername) SELECT 'tier 1' UNION SELECT 'tier 2' UNION SELECT 'tier 3'

    INSERT INTO tierrule (tierid, tblname, fieldname, operand, ruleval)

    SELECT 1, 'person', 'age', '>', '5' UNION SELECT 1, 'person', 'age', '<', '10' UNION

    SELECT 1, 'personval', 'pv_val1', '>', '10' UNION SELECT 1, 'demoval', 'demo_val1', '=', '1' UNION

    SELECT 1, 'demoval', 'demo_val2', '=', '1' UNION SELECT 2, 'person', 'age', '>', '10' UNION

    SELECT 2, 'person', 'age', '<', '20' UNION SELECT 2, 'personval', 'pv_val1', '=', '5' UNION

    SELECT 3, 'person', 'age', '>', '20' UNION SELECT 3, 'personval', 'pv_val1', '=', '5' UNION

    SELECT 3, 'demoval', 'demo_val1', '=', '5'

    GO

    Edit: Cleaned up the SQL a little

  • If it makes the concept easier - as personid is always present - we can remove the table and assume the data all comes from one central view - still stuck on how to do this without running a select for each and every rule though :

  • Are you trying to figure out which tier a person should belong to? In other words this sounds like you are trying to write a GetPersonTier function or something like that?

    How do you handle conflicting rules? It looks like from the current data that each tier is mutually exclusive but how do you prevent that type of data?

    I am not really sure I understand exactly what you are trying to do here but this looks like a nightmare to maintain as this gets more and more complicated.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Basically - I need to match the person to the tier based on some user defined rules. These rules would be along the lines of tier 4 is only available to children aged 4-8 who have brown hair. Now lets say that someone creates tier 5 which only has the rule 'has brown hair' - if we got a 6yr old with brown hair enrolled, 2 tiers should be returned to the user, and they'd need to actually choose which is the correct one.

    However, if only 1 match was to come up, then the tier would be automatically assigned. If no matches are returned, then I'll display a message to the user because either they messed up their rules, or the participant shouldn't be enrolled in the first place, then I'd just allow them to choose from any of the tiers and send an email to the coordinator to look into it.

    The point of this is to reduce the amount of misfiled tiers to the participants - for example - a user could assign tier 4 to a participant who is 12 yrs old despite the fact that tier 4 is only for children aged 4-8 yrs. This leads to all kinds of data issues further down the line from a business standpoint - nobody may notice this for 6 months, and by then its too late to have the participant redo all the tasks assigned with their correct tier, and the data we do have is worthless because it doesn't match tier 4 criteria.

    I'm thinking I may need to rework the table design, but I'm not sure the best way to do that and still have it open ended enough that if I wanted to include some new table to be used in the rules, I don't want to have to recode everything.

  • kevmck (7/30/2012)


    Basically - I need to match the person to the tier based on some user defined rules. These rules would be along the lines of tier 4 is only available to children aged 4-8 who have brown hair. Now lets say that someone creates tier 5 which only has the rule 'has brown hair' - if we got a 6yr old with brown hair enrolled, 2 tiers should be returned to the user, and they'd need to actually choose which is the correct one.

    However, if only 1 match was to come up, then the tier would be automatically assigned. If no matches are returned, then I'll display a message to the user because either they messed up their rules, or the participant shouldn't be enrolled in the first place, then I'd just allow them to choose from any of the tiers and send an email to the coordinator to look into it.

    The point of this is to reduce the amount of misfiled tiers to the participants - for example - a user could assign tier 4 to a participant who is 12 yrs old despite the fact that tier 4 is only for children aged 4-8 yrs. This leads to all kinds of data issues further down the line from a business standpoint - nobody may notice this for 6 months, and by then its too late to have the participant redo all the tasks assigned with their correct tier, and the data we do have is worthless because it doesn't match tier 4 criteria.

    I'm thinking I may need to rework the table design, but I'm not sure the best way to do that and still have it open ended enough that if I wanted to include some new table to be used in the rules, I don't want to have to recode everything.

    So you need an iTVF or something along those lines? What are the parameters and such. Keep in mind I am intimate with this project like you are. I am still trying to figure out what you are trying to do. I am afraid that at some point all you will want to do is slit your wrists. Dynamic sql and a lot of it is going to become your best friend. In effect you have a modified EAV type of design with these rules.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • I suspect you're correct about how I'm going to end up with approaching this in this method.

    Perhaps it'd make more sense if I explained the business process

    The system is broken out into groups of customers - each customer can have an admin. Each admin can set up the system for just their particular group of participants, products, services etc. In my example I skipped the customer part of it as I can add that back in easily enough and wanted to keep it simple.

    Each participant the admin enrolls needs to be assigned a particular tier, that tier dictates what services are applied to the participant - that participant then has data collected based on those services.

    Tier 1 gets service 1 and service 3

    Tier 2 gets service 1 and service 4

    Tier 3 gets service 2 and service 5

    Tier 4 gets service 4 only

    Lets assume service 1 and 2 are very similar, as are 4 and 5. If we assigned Tier 2 to a Tier 3 participant by accident, it could be missed due to the similarity to the data required. (it could be that service 2 has 1 additional question thats vital for that particular tier, but not for the other - if we miss that question, the data is useless) So to eliminate that, I wanted to create a rules table to ensure that only the appropriate tiers are available to the admin who is enrolling the participant, depending on the participants information.

    Now obviously, I wouldn't know what the business rules are for the customer's particular tiers - and I don't really want to know.. trying to maintain their business rules would be a nightmare.

    So I wanted a simple way for the user to be able to implement their own rules.. so when they create the tier during setup, they'll have a list of fields that they can use in the rules, and they can supply values for them.

    Field | Operand | Value

    ie.

    Age | > | 5

    From that, the hope would be that I can take the participants information, read the rules and match my data to the rules to come up with the tier that is best suited to the participant.

    What I had done previously is something like

    Tier | MinAge | MaxAge | Hair Color | Eye Color etc.

    But as I make more fields available for the rules, I end up with added overhead to altering the rule table (which had a ton of columns) and modifying the lookup SQL.

    So the hope was to make it a little more hands off, and thats where I came up with the rules table I have now

    field | operand | value

    With the hope that they can just select the field name, select the operand (=, <, > etc.) and the value they want to resolve to.

    But - nightmare!

    As I stated early on.. I *can* get the correct tier with my model - a 2 query process, the first gets the rules and generates the SQL, and then that SQL to get the tier.. but its just clunky and can't possibly be as efficient as I'd like it to be - not a problem now, but in 2-3 yrs, that could be an issue.

  • By the way you're describing this - I'd recommend NOT making this a pure SQL approach. I certainly see storing the data there, and having the consequences of tier assignment set up there, but I'd recommend NOT putting the dynamic rules (as you call them) into TSQL. TSQL isn't particularly well suited for what you plan on doing.

    Your best bet might be to look at an open-source rules engine, allowing you to use/design a spredsheet-style rule editor which automatically updates the rule assignment without needing ugly and complicated dynamic SQL stored procedures to do the same.

    There are lots of open source ones out there. We've landed on DROOLS for now, but we've had limited success with InRULES and NxBRE previously.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply