design question for "master" Item data structure

  • My team is working on a data structure to store information on our various IT assets, such as routers, firewalls, but also applications, and human resources.  We want to be able to create correlational queries which take advantage of higher-level commonalities of these potentially different types of objects; for example, getting a list of all objects with identified security vulnerabilities, which could mix hardware and software.

    Here's the design we've come up with so far:

    ItemTypes (contains self-referential parent-child relationship)

    ItemTypeID, ItemTypeName, ParentTypeID

     

    Items (contains the master identity key field to be assigned to the disparate tables)

    ItemID, ItemTypeID

     

    Applications

    ItemID (from Items), Application Number (from external source), Application Name, SoftwareVersion, etc

     

    NetworkHW

    ItemID (from Items), Device Type, Device IP, etc

     

    People

    ItemID (from Items), NetworkID, FirstName, etc

     

    ...and so on.  What this design gets us is the ability to have a common structure to relate them together (by ItemType) and also the ability to store object type-specific attributes as well.  The trick is, once you have this correlated structure, how do you write scalable queries that have access to the specific attributes?  What I'm finding is that I end up with queries like this:

    <snip>

    select

      case itemTypeName

    when 'Applications' then (select Name from Applications where ItemID = i.itemId)

    when 'Router' then (select Device_Name from NetworkHW where ItemId = i.itemId)

    when 'Firewall' then (select Device_Name from NetworkHW where ItemId = i.itemId)

    when 'People' then (select firstName + ' ' + lastName from People where ItemId = i.itemId)

    end as ObjectName

    ,  case itemTypeName

    when 'Applications' then (select AppID from Applications where ItemID = i.itemId)

    when 'Router' then (select id from NetworkHW where ItemId = i.itemId)

    when 'Firewall' then (select id from NetworkHW where ItemId = i.itemId)

    when 'People' then (select userId from People where ItemId = i.itemId)

    end as ObjectID

    from

    Items i

    inner

    join ItemTypes it on i.ItemTypeID = it.ItemTypeID

    </snip>

     

    ...not very scalable.  The only other alternative which I can think of is creating some sort of master table with all items contained inside, and 50+ user-defined fields which would serve different purposes for each row type.  (I find this idea horrifying.)

    Does anybody have any ideas? 

    Thanks...

    Jeff

  • Bad design. From the beginning.

    "Do not multiply entities needlessly." Heard about this?

    Now you need either rethink your ideas aboun design or fix the error in UNION query.

    select Name as ObjectName, AppID as ObjectID

    FROM from Items i

    inner join ItemTypes it on i.ItemTypeID = it.ItemTypeID

    inner join Applications A on A.ItemID = i.itemId

    WHERE itemTypeName = 'Applications'

    UNION

    -- select from next table, and so on.

    Anyway your scalability is killed by the list of tables for different types. Adding new type means adding a new table, that means changing queries, etc.

    Search in Google for handling profiles in database and redesign your database while it's not too late.

    _____________
    Code for TallyGenerator

  • Thanks for the reply - perhaps I can clarify something - the list of entity tables (Apps, People, HW) already exist as part of a data warehouse application.  Are you thinking that every item that we track - and keep in mind the scope of items, and the differences in attributes - should really be in one table?

    The UNION query isn't a bad idea - I'll play with that.

    Jeff

  • It is in one table in my application.

    And it works fine and it's very fast.

    _____________
    Code for TallyGenerator

  • select a.Name ObjectName, a.AppID ObjectID, a.IsSecurityRisk

    from

    Applications a

    union

    all

    select

    nhw.Device_Name, nhw.id, nhw.IsSecurityRisk

    from

    NetworkHW nhw

    union all

    select

    p.firstName + ' ' + p.lastName, p.userId, p.IsSecurityRisk

    from

    People p

    You are implemeting a very general supertype, so general that in the example you give, it serves only to ensure uniqueness across the different Pk ranges, and to 'forward' a foreign key to itemtype. To make good use of the Items supertype table, you should keep common attributes in that table. Then you dont need to add every subtype table into a query that is only concerned with common (item-level) attributes. If there are further subclasses or superclasses whose members share attributes, then add other tables into the chain. For example, router and firewall might be subtypes of networkHW.

    I wouldn't use a hierarchy table - the hierarchy should be apparent from the table structure. The itemtype table should at most be a lookup and probably not be used at all. The type of entity is identified in the schema by the table name or other metadata. It doesn't need to be added as a data value. And the structure of the relationships between entities are given by the structure of the referential integrity constraints. There is no need to record your own denormalised metadata on the structure. To define the relationships between objects, use DDL. To get info on those relationships, use system metadata objects.

     
    In the example you give, a foreign key is defined from each of the subtype tables to Item, the foreign key also being the primary key of the subtype table. Item could hold an issecurityrisk column, or a securitystatusid or ...

    select coalesce(a.Name,nhw.Device_Name,p.firstName+' '+p.lastName) ObjectName

    , coalesce(a.AppID,nhw.id,p.userId) ObjectID
    , i.isSecurityrisk

    from

    Item i

    left

    join Applications a on a.ItemID = i.ItemID

    left

    join NetworkHW nhw on nhw.ItemId = i.itemId

    left

    join People p on p.ItemId = i.itemId

    This doesn't guarantee that an item might not instance more than one subtype, though - which it shouldn't, since they are clearly distinct entities. Nothing is an Application and NetworkHW. To reflect this you could have a number of alternative foreign keys in the Item table, with a check constraint to ensure only one is populated. Each of the foreign keys would point to a subtype table. This would mean you could tell the subtype of an item without joining to all the possible subtype tables. Notice that you already have candidate keys in the subtype tables even without ItemID.

    select

    coalesce(a.Name,nhw.Device_Name,p.firstName+' '+p.lastName) ObjectName

    ,

    coalesce(a.AppID,nhw.id,p.userId) ObjectID

    , i.isSecurityrisk

    from

    Item i

    left

    join Applications a on a.AppID= i.ApplicationID

    left

    join NetworkHW nhw on nhw.Id = i.NetworkHWId

    left

    join People p on p.userId = i.PeopleId

    This would also allow you to add a securityrisk table with a set of alternative foreign keys only to the appropriate tables - so if we imagine that the securityrisk question does not arise for persons, there is no FK to persons in the securityrisk table:

    select

    coalesce(a.Name,nhw.Device_Name,p.firstName+' '+p.lastName) ObjectName

    ,

    coalesce(a.AppID,nhw.id,p.userId) ObjectID

    , ss.SecurityRiskStatus

    from

    SecurityStatus ss

    left

    join Applications a on a.ApplicationID = ss.ApplicationID

    left

    join NetworkHW nhw on nhw.NetworkHWId = ss.NetworkHWId

    Notice also that the output you are after is achievable in one example by a UNION. Values that can go in the same column in your query results can perhaps go in the same column in your tables? For example, the columns combined and aliased as ObjectName could (apart from the People entry) be placed in a column at Item level. And if there are enough examples like this, you might find that there is less subtype-specific info than you fear...

     

    [edit: Sergiy, I see you got there first...are you saying that you would store 'person' and 'application' in the same (dimension) table? With lots of NULLs? Surely not different uses for a single column?]

    [The question is partly - are they different attributes, or do they just have a slightly different  significance between subtypes. E.g., all three might have a monthly cost attribute for cost analysis purposes, and in this context, it doesn't matter that the costs are incurred by each resource in very different ways (e.g. short licence fees, capital depreciation, payroll). In an accounting database, storing persons and capital assets in one table would make no sense. So it's partly a matter of context: which similarities and differences you are interested in.]

    Tim Wilkinson

    "If it doesn't work in practice, you're using the wrong theory"
    - Immanuel Kant

  • Tim, ready to bet on it?

    What do you mean: "I see you got there first..."?

    I tell you, it's not a matter of guessing if I can do it. I have it done. And it works. And works really well.

    All profiles for persons, their companies, invoices and products are held in 4 tables with up to 4 not nullable columns in each.

    You won't believe how short is my code and how fast it's being executed.

    _____________
    Code for TallyGenerator

  • Bet on what? I see you got there first: I didn't see your entry until I posted mine.

    But 4 tables, not 1?

    Tim Wilkinson

    "If it doesn't work in practice, you're using the wrong theory"
    - Immanuel Kant

  • Who says 1?

    4 tables including types and subtypes.

    _____________
    Code for TallyGenerator

  • I appreciate the input, guys...

    ...Sergiy, I think we're talking about two very different applications.  I've done one before similar to yours and had great results with that kind of design.  I'm just not sure it satisfies the requirements I have.  For instance, the NetworkHW table has 20+ columns of relevant data, while Applications has 4.  This is data that we're integrating from existing sources, and we don't want to lose any data in the interest of a slim design.   And we're also talking about 20+ completely different data sources to integrate, which is why we're trying to implement some kind of framework between them.  I've given the example of 3 different entity tables for prototyping, but in reality there are many, many more.  In that sense, reducing the number of entitities makes a lot of sense, but I'm not sure it's worth losing the subtype-specific attributes, or generalizing them down to user-defined fields.

    But I also could be just not quite understanding your concept, either...

  • All those LEFT JOINs will kill your performance.

    But, would you really ever need to process the Item master table?  Wouldn't you be more likely to be processing the Applications, etc., tables?  That's a single INNER join.

    I don't see a huge problem w/ the design.  I personally would never try to put everything into one big "object" table.

     

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • The correlated queries would be more exception than the rule - you're correct in that most situations it would be on the entity tables, or between them.  The idea with the master tables is to correlate the entity tables with business logic.  For instance, all items of type "Firewall" are considered to be security controls to minimize risk.  All items of type "Compliance Gap" are considered to be risks needing to be minimized.  The concept is to create executive views on the state of our operation.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply