films database (yeah, yeah, hear me out!)

  • I've done a fair bit of Googling and asking dba friends, but I wanted to see what some of you guys thought about this.

    I have a database of films. The issue I want to address is one of managing the people associated with each film.

    However, I want to be able to have multiple alternate names (AKAs) for each person, and to keep track of which of those names were used for each film. I think IMDB does this, so you've probably seen it before. I also want to be able to store information about each person, such as birth date, country of origin, biography, etc.

    So, as so often seems to be the case, I quickly find myself bumping into the issue of how to store hierarchical data in SQL Server. I can't seem to come up with a solution that I feel good about.

    I'd be happy to share my 2-3 working ideas if you want. But you can probably guess what they are, more or less. I'm kind of just wondering what you guys would do design-wise in this situation since you always seem to surprise me.

    Assuming I have at least a FILMS table, a PEOPLE table, a ROLES table, and a FILMS_PEOPLE_ROLES (many to many) table, what would you do to handle the AKAs? Store them as records in the PEOPLE table? In their own AKAs table?

    Concern #1: If I store the AKAs in a separate table, as "children" with each record referencing its "parent" PEOPLE record, what's to stop someone entering the "parent" name as an AKA for itself? Conversely, what's to stop someone entering a new PEOPLE record with a name that already exists in the AKAs table? Perhaps the solution is a trigger to automatically add a new duplicate AKA record whenever a new PEOPLE record is created?

    Concern #2: If I store the AKAs in the PEOPLE table, there will be many columns that are unused/irrelevant to those records (such as birth date, bio, etc.).

    Anyway, just curious what your thoughts are. Maybe you have some great ideas that I haven't even considered yet. πŸ™‚


    "If I had been drinking out of that toilet, I might have been killed." -Ace Ventura

  • I think your concern #2 is something you shouldn't let happen.

    Concern #1 can perhaps be answered by saying don't put the name in the people table. Instead of people and aka tables, have a people table and a names table. The people table can have date of birth, date of death, and - if you really want it - a reference to a record in the names table (to indicate the "real" name). The names table has to have a name and a reference to the people table. Presumably you will use auto-generated surrogate keys in both these tables. Each table will probably have additional fields I haven't mentioned. Fo exarmple you will perhaps want to indicate whether a person was an actor, a director, a producer, a cameraman,....or some combination of these functions and perhaps hold that information in the people table (which introduces a problem: which names did he use in which functions) or distributed over the names table or not hol;d it anywhere (better design - you can pick the info up by looking at references from a function table to the names table if primary key of the function table combine film identity and function identity).

    But trying a few different schemas and trying to normalise them to EKNF (that's more normalised than 3NF but less normalised than BCNF) and also to elimnate nullable columns where reasonable will probably get you more progress than a boring grey-haired old fart like me telling you how to do it, so I suggest you play around with some ideas and see what comes out when you normalise (provided you can see what the normal forms mean in a context where some columns are nullable - if you don't you had better make sure you eliminate all nullable columns before deciding how to normalise, and prepare to write some pretty wide joins). If you are lucky what that will give you is actually a schema which represents all functiponal dependencies (because it's EKNF) but is also in BCNF (since in almost all schemas all the keys are elementary keys), which would mean that it was safe to go on to 4NF if there are any multifunctional dependencies you could eliminate. That, I think, is where you should want to be.

    Tom

  • Hmm, okay, thanks for the reply. I think I'm following you, and it sounds like you're basically suggesting that I proceed with my "separate AKA's table" idea, except that it will also contain the "primary" names in addition to the AKAs. And will be called NAMES instead of AKAs. Makes sense.

    And then the PEOPLE table doesn't need to actually store any names...which makes sense, although as a visual person, kind of freaks me out. πŸ™‚

    Thanks for the feedback! Anyone else? Curious if you have thoughts.


    "If I had been drinking out of that toilet, I might have been killed." -Ace Ventura

  • I think what L' Eomot InversΓ© is right on. You don't really need a hierarchical data structure since there really is only one level of data. A simple One to Many relationship.

    Tables: (Making assumption that you create surrogate primary keys on all tables)

    Films

    Roles

    Names

    People

    Films_Roles_Names

    I would say the People table should have the person's bio including their real name or else you'd have to create a way to indicate the person real name in the Names table.

    The Names table should just have the NameID, PeopleID, the AKA, and possibly some details behind the AKA name. You should also make a record for the parent name in case that name is used in a role for a film.

    In the Films_Roles_Names, you'd have FilmID, RoleID, and NameID. With this structure, you can easily see what AKA a person used for a specific role for each film.

  • Thanks for weighing in. My concern is just that if I create a "master" record in "people" and use their "real name" in that table, what's to prevent that name being duplicated in the "names" table? etc. The suggestion to eliminate names completely from the "people" table solves that issue, but yeah, that would make it more difficult to manage that table and see who's who on a day to day basis--at least for the likes of me! πŸ™‚


    "If I had been drinking out of that toilet, I might have been killed." -Ace Ventura

  • autoexcrement (4/18/2013)


    Thanks for weighing in. My concern is just that if I create a "master" record in "people" and use their "real name" in that table, what's to prevent that name being duplicated in the "names" table? etc. The suggestion to eliminate names completely from the "people" table solves that issue, but yeah, that would make it more difficult to manage that table and see who's who on a day to day basis--at least for the likes of me! :)[/quote

    Well, one option is to have a reference from the people table to the name table to indicate the real name; then each entry in the people table has a reference to the corresponding real name (if it's an AKA entry) or a null reference (if it is the real name entry). Then the thing you look at on a day to day basis to manage real people is

    select people.*, names.name from people inner join names on names.nameID = people.realnameID

    Another option is to have the names table have a reference to the people table and a bit to indicate whether it's a real name, and then the thing you look at day to day is

    select people.*, names.name from people inner join names on names.nameIDidRealName=1 and people.personID = names.personID

    there are a few other options too. Of course whatever option you chose it would make sense to create a view, perhaps called PeopleWithRealName (for the two options above, the two select statements are the selects that define that view) and you might want either to give that view INSTEAD OF triggers for update and delete and insert or to ensure that changes to the view (and any changes to the two underlying tables that change the view) are encapsulated in stored procedures, so that either changes can't be achieved except through those stored procedures or changes invoke the triggers.

    Tom

  • Thanks, I love your replies. πŸ™‚ Few questions, if you have time...

    1) What do you mean by suggesting triggers on the view? I didn't know triggers could have anything to do with views...?

    2) If I end up with a bit field in the names table to indicate a "real" name, what is the best way to enforce a rule that each person has not less than nor greater than 1 "real" name flagged in the "names" table?

    3) Considering #2, how to change a "real" name from one alias to another if the database doesn't allow a transitional state where any given person has either 0 or 2 "real" simultaneous names?


    "If I had been drinking out of that toilet, I might have been killed." -Ace Ventura

  • autoexcrement (4/18/2013)


    Thanks, I love your replies. πŸ™‚ Few questions, if you have time...

    1) What do you mean by suggesting triggers on the view? I didn't know triggers could have anything to do with views...?

    Yes, triggers on views are allowed. For many views (but not all) they are the only way of allowing an update statement for the view. Look up "instead of" triggers in Books Online.

    Personally I'm biased towards using Stored Procedures rather than views to do encapsulation, even though I know views work very well in the simpler cases.

    2) If I end up with a bit field in the names table to indicate a "real" name, what is the best way to enforce a rule that each person has not less than nor greater than 1 "real" name flagged in the "names" table?

    The only way I know of doing it is with triggers. And it's a bit of a nightmare. So I don't think that is a good option - having the people table referencing the real name is much tidier.

    3) Considering #2, how to change a "real" name from one alias to another if the database doesn't allow a transitional state where any given person has either 0 or 2 "real" simultaneous names?

    You are asking the right questions! That particular question is why doing it with triggers is a thorough nightmare. Perhaps there are people who can tell you how to do it cleanly with triggers, but I'm not one of them. What you have to do is devise a sequence of statements each of which attacks two rows so that when you take the flag off one of them the same statement puts it on another. This is of course rather easier now that we have the MERGE statement in T-SQL, but it's still certainly non-trivial.

    Of course there are other questions you need to worry about in this area. Does it ever happen that two different people have the same real name? What happens if two different people have used the same alternative name? What happens when someone's real name changes? With luck you may be able to say you will not cope with some of the more irritating possibilities, but you need to look at these thigs early on to determine whether you need to include them in the design as not doing so you risk ending up with a nice design that you suddenly have to tear up because something you never thought about has now happened.

    Tom

  • Awesome, as usual. Thanks. Definitely makes more sense to have the "real" name referenced from the PEOPLE table rather than from the NAMES table.

    Plenty more things to consider here, and I'll be stewing on them. Thanks!


    "If I had been drinking out of that toilet, I might have been killed." -Ace Ventura

  • autoexcrement (4/18/2013)


    2) If I end up with a bit field in the names table to indicate a "real" name, what is the best way to enforce a rule that each person has not less than nor greater than 1 "real" name flagged in the "names" table?

    Just becuase I happened to ask (and then answer) the same question to myself, have you seen Filtered Indexes in SQL2008+?

    Given:

    CREATE TABLE people (

    person_id INTEGER IDENTITY(1, 1) PRIMARY KEY

    );

    CREATE TABLE names (

    name_id INTEGER IDENTITY(1, 1) PRIMARY KEY,

    person_id INTEGER FOREIGN KEY REFERENCES people(person_id),

    name NVARCHAR(MAX) NOT NULL,

    is_primary BIT NOT NULL

    );

    ...the index:

    CREATE UNIQUE NONCLUSTERED INDEX uqfi_names

    ON names (person_id)

    WHERE is_primary = 1

    ...should technically only allow one row for each person_id to be the 'primary'.

    J.

  • Thank you, that is nifty! I don't think I have heard of that before. πŸ™‚


    "If I had been drinking out of that toilet, I might have been killed." -Ace Ventura

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply