Relation table

  • Hi 
    I have a [JOB] table that may reference multiple objects (by integer ID) each stored in a different table 
    there are several ways to map this scenario
    1. a relation table for each object table
       REL1: [job_id, obj_id1], 
       REL2: [job_id, obj_id2],
       ...

    2. just one relation table with multiple (sparse) obj ids
       RELS [job_id, obj_id1, obj_id2, ...]

    3. one relation table with obj type and id 
       RELS [job_id, obj_type, obj_id]

    over the years I've used all solutions, trying to avoid n. 2 for space waste since sparse columns are a relatively recent addition.
    i also have to say that case 3 is really a pain to query, so I would avoid it 
    now the question....
    considering to use sparse columns and considering to have 4 or 5 different obj ids, which one between cases 1 and 2 would you choose ?
    (note... i've also read that internally sparse are implemented with xml engine, and I've had bad performance experience with it ... )
    TIA

  • This is difficult in envisage based on your limit sample. The 3rd option appear to be the normalised option however. The 1st one will be awful to work with (you'll likely end up with Dynamic SQL solutions just to get your data out), and the 2nd isn't normalised. I can't see why you think the normalised version is a pain to query; I'd suggest that the 2nd isn't great, and the 1st (well, you have my opinion). Why do you want to avoid normalised data?

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Thom
    thx for your reply.

    The third is a pain because in the join condition you always need to put two clauses and if for mistake you forget to put the obj type you have unpredictable results

    however just to argue a little more suppose I have two tables files and file-segments, i want to tag them
    a struct may be
    tag[tad-id, name, color, ...]
    1

    file[ file-id]  files-tag[ file-id, tag-id]
    segment[ file-id, segment-id]  segment-tag[ segment-id, tag-id]

    2

    file[ file-id]  
    segment[ file-id, segment-id] 
    tagging[ tag-id, file-id, segment-id]

    3

    files[ file-id]  

    segment[ file-id, segment-id] 

    tagging[ tag-id,  table-id, foreign-key]

    table[ table-id, name]

    Querying in case 1 or 2 is practically the same, in 1st case you change the join table, in the 2nd you change the columns
    in case 3 you always have to join the foreign-key and then add an additional clause for narrowing results, this for me is very error prone beause if you forget the clause, you have unpredictable result...
    hope is more clear.

  • carlo.pagliei 63874 - Monday, November 26, 2018 2:20 AM

    Thom
    thx for your reply.

    The third is a pain because in the join condition you always need to put two clauses and if for mistake you forget to put the obj type you have unpredictable results

    This doesn't seem like a pain, this seems more like a user error. Normalised data is definitely the way to go. "I might forget to include both clauses" is a really bad excuse for bad data design.

    Without knowing the broader definitions of your data I can't show samples, but if you go with option 1 you are going to (very likely) forever normalising your data, or using an expression like the following to get the value of object:
    CHOOSE(@intVar, Obj_id1, Obj_id2, Obj_id3, ... Obj_id999).
    That's not very scalable, and as soon as you add an object id 1000 you need to update ALL of your references.

    For the other, well you can't do syntax like:
    FROM OtherTable OT
         JOIN OT.ObjID O ON OT.JobID = O.JobID

    Object names must be literals; this means you'd have to go down the dynamic SQL route for EVERY query. Yuck.

    The easy option is to normalise your data, like i said. Then your clause is just:
    FROM OtherTable OT
         JOIN ObjectTable O ON OT.JobID = O.JobID AND {O.ObjID = 2 / OT.ObjID = OT.ObjID}

    That's it. It scales, and it's not dynamic.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • >> I have a [JOB] table that may reference multiple objects (by integer ID) each stored in a different table <<

    singular table name? This tells us you have only one job. In SQL, we don’t reference objects; we reference other tables. And since identifiers are on nominal scale by definition, we don’t use integers for them. What’s a square root of your credit card number? Integers are for ordinals, quantities and magnitudes and the relational model. Because are on a nominal scale, this is not an option . Finally, where is the DDL for this table?

    From your narrative, it looks like you’ve what’s called an EAV design. And you need to get rid of it

    >> there are several ways to map this scenario <<
    1. a relation table for each object table .
    REL1: [job_id, obj_id1],
    REL2: [job_id, obj_id2],
    ...

    >> 2. just one relation table with multiple (sparse) obj ids
    RELS [job_id, obj_id1, obj_id2, …] <<

    so we got a magic relation table that could be automobiles, squids or Lady Gaga. This violates one of the fundamental principles of logic called the law of identity,

    >> 3. one relation table with obj type and id
    RELS [job_id, obj_type, obj_id] <<

    in SQL, a relation is shown as a table with the name of the relation and REFERENCES clauses to the of the entity tables involved in the relationship. A <something>_type is what we call “attribute property” in RDBMS. It is not generic. What you’re doing is putting metadata in the base tables!

    >> over the years I've used all solutions, ...<<

    Over the years I’ve been correcting these horrible design errors. What you’re asking is like “what’s the best kind of rocks to pound screws and defined furniture?” When the right answer is “don’t do that!” instead. Google it.

    https://www.red-gate.com/simple-talk/sql/t-sql.../avoiding-the-eav-of-destruction/
    https://www.postgresql.org/message-id/4BE45F66.5070402@magwerks.com
    karwin.blogspot.com/2009/05/eav-fail.html

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • Thanks for yout thoughts Joe
    Even if I have an EAV design part in the DB (here the post:EAValue is evil ?) in this case is not an EAV design, is just a job queue with his job definition that may refer several targets. I don't need to do complex queries over that since there is an automated process that will fetch the queue ad execute the jobs.
    Anyway this is a SQL with table defs
    As you can see there are several entities (files, segments, projects) with a list of jobs to do on that entities.
    Job may apply or not on several entities, for example a job "Render To New File" may be applied to any of those object...
    So I was asking myself which is the most clean and less-issue-prone design for that scenario.... sparse columns could be a solution... just asking your thoughts

    Hope is more clear.

    -- represents a classic file
    create table [File] (
    [file_id] int primary key identity(1, 1),
    [path] nvarchar(255),
    bigint,
    [duration] int
    )

    -- represents a segment within a file
    create table [Segment] (
    [segment_id] int primary key identity(1, 1),
    [file_id] int,
    [mark_in] int,
    [duration] int
    )

    -- represents a collection of segments
    create table [Project] (
    [project_id] int primary key identity(1, 1),
    [segment_id] int,
    [ordinal] int
    )

    -- job definition: define a series of automated or manual operations
    -- to be applied on file, segment or project
    -- typical example may be:
    -- copy file, move file, extract segment to new file, render project files,
    -- bundle project into a zip and ship by email, and may others...
    -- a single job def may apply to different kind of targets
    create table [JobDefinition] (
    [jobDefinition_id] int,
    [name] nvarchar(255),
    [description] nvarchar(max),
    [config] xml
    )

    -- now... each operation must be applied to a different target

    -- version 1
    create table [ActiveJob_version1_onFile] (
    [job_id] int primary key identity(1, 1),
    [jobDefinition_id] int,
    [runtimeStatus] xml,
    [file_id] int,
    )
    create table [ActiveJob_version1_onSegment] (
    [job_id] int primary key identity(1, 1),
    [jobDefinition_id] int,
    [runtimeStatus] xml,
    [segment_id] int,
    )
    create table [ActiveJob_version1_onProject] (
    [job_id] int primary key identity(1, 1),
    [jobDefinition_id] int,
    [runtimeStatus] xml,
    [project_id] int
    )

    -- version 2
    create table [ActiveJob_version2] (
    [job_id] int primary key identity(1, 1),
    [jobDefinition_id] int,
    [runtimeStatus] xml,
    [file_id] int,
    [segment_id] int,
    [project_id] int
    )

    -- version 3
    create table [ActiveJob_version3] (
    [job_id] int primary key identity(1, 1),
    [jobDefinition_id] int,
    [runtimeStatus] xml,
    [target_id] int,
    [target_obj] int
    )

  • Thom A - Monday, November 26, 2018 2:38 AM

    carlo.pagliei 63874 - Monday, November 26, 2018 2:20 AM

    Thom
    thx for your reply.

    The third is a pain because in the join condition you always need to put two clauses and if for mistake you forget to put the obj type you have unpredictable results

    This doesn't seem like a pain, this seems more like a user error. Normalised data is definitely the way to go. "I might forget to include both clauses" is a really bad excuse for bad data design.

    Without knowing the broader definitions of your data I can't show samples, but if you go with option 1 you are going to (very likely) forever normalising your data, or using an expression like the following to get the value of object:
    CHOOSE(@intVar, Obj_id1, Obj_id2, Obj_id3, ... Obj_id999).
    That's not very scalable, and as soon as you add an object id 1000 you need to update ALL of your references.

    For the other, well you can't do syntax like:
    FROM OtherTable OT
         JOIN OT.ObjID O ON OT.JobID = O.JobID

    Object names must be literals; this means you'd have to go down the dynamic SQL route for EVERY query. Yuck.

    The easy option is to normalise your data, like i said. Then your clause is just:
    FROM OtherTable OT
         JOIN ObjectTable O ON OT.JobID = O.JobID AND {O.ObjID = 2 / OT.ObjID = OT.ObjID}

    That's it. It scales, and it's not dynamic.

    You'right... probably is the best among the three ! thx

  • Seems to me that this is a classic case of normalization.
    You have a table containing Jobs and some information about the jobs.
    You have another table containing Objects and some characteristics of the objects.
    A Job can reference one (zero?) or more Objects.  So you'd set up a relationship table of "Job to Object" that contains the JobID and the associated ObjectID's - and maybe some other information relating to the combo.
    I'm not sure where the Object Type comes in.  Do all the objects have some standardized list of types?  And can an Object have more than one Object Type?  If there's only one type per object, you'd have an ObjectType table (like a lookup table) and put the ID from that table in a column in the object table.  If the object can have more than one type, you'd have aonther relationship table "Object to ObjectType" that would contain the ID's of the Object and the ObjectType.

    Querying is simply a matter of joining the tables.

    I've inherited tables that have multiple columns of data (like your option 2) and they are truly a pain to work with, particularly if you need to find if a row has a particular item in it.  You need a construct like
           WHERE obj1 = @x OR obj2 = @x OR ... objN = @x

  • gvoshol 73146 - Wednesday, November 28, 2018 6:07 AM

    I'm not sure where the Object Type comes in.  

    The "object type" is the core of the problem: a job has a FK that may reference different entities (tables)... just to make an example: if the job says "compress to zip", then the target may be a file, a folder, a collection of unrelated documents and so on... so I need to store both the id and the entity referenced by.
    I was trying to understand which of the three proposed approaches is the best...probably using F-Key and F-Entity in the job table is the better one.

  • You need to switch your perspective on how you approach the data.  You seems to be looking at it from the "many" side of a relationship.   Conceptually, there are no collections in a RDBMS.  The relationships for an individual record is defined instead.  You mention that a project is a collection of segments.  That is incorrect. A segment belongs to a project.  If I wanted to find all the segments for a project, I would search for segments belonging to the project, since the project definition should not include which segments belong to it. 
    SELECT * FROM SEGMENT WHERE PROJECT_ID = 1

    Another issue you'll need to address is 3NF normalization.  You describe a relationships table as RELS [job_id, obj_type, obj_id]
    I assume this is to address a M:M relationship between job & object.  However, what is the purpose of obj_type in this table.  If that describes the object, shouldn't it be in the object table?  If it describes the relationship between job and obj, it should be named differently.

    When working with a many to many relationship, you'll need to split the relationship with an intermediate entity using 1:1 or 1:M relationships.
    Using the typical User & Group scenario as an example and keeping in mind that a relationship is always described from the perspective of a single record in the starting object:
    User M:M Group
    In this notation, the individual record is implied, but is casually stated when describing it.
    A user can be in many groups.
    A group can contain many users.
    To break this apart you create an intermediate relationship table, named UserGroup
    A user can be in many usergroups, but a usergroup can only reference one user.  User 1:M UserGroup
    A group can contain many usergroups, but a usergroup can onlly reference one group.  Group 1:M UserGroup

    Now for your specific model:
    You want a job to be able to perform many actions on 0:M projects or 0:M segments or 0:M files.
    I don't know the details, but based on what you've provided, I would create the model.
    File 1:M Segment;   Segment.File_ID = File.ID
    Segment 1:M Project; Project.Segment_ID = Segment.ID
    Project Type 1:M Project;  Project.ProjectType_ID = ProjectType.ID
    Job 1:M JobProject; JobProject.Job_ID = Job.ID
    Project 1:M JobProject JobProject.Project_ID = Project.ID

    I hope this helps.

    Wes
    (A solid design is always preferable to a creative workaround)

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply