Relation table

Question

Relation table

carlo.pagliei 63874

Old Hand

Points: 327
More actions
November 26, 2018 at 1:27 am

#382336

Hi
I have a [JOB] table that may reference multiple objects (by integer ID) each stored in a different table
there are several ways to map this scenario
1. a relation table for each object table
   REL1: [job_id, obj_id1],
   REL2: [job_id, obj_id2],
   ...
2. just one relation table with multiple (sparse) obj ids
   RELS [job_id, obj_id1, obj_id2, ...]
3. one relation table with obj type and id
   RELS [job_id, obj_type, obj_id]
over the years I've used all solutions, trying to avoid n. 2 for space waste since sparse columns are a relatively recent addition.
i also have to say that case 3 is really a pain to query, so I would avoid it
now the question....
considering to use sparse columns and considering to have 4 or 5 different obj ids, which one between cases 1 and 2 would you choose ?
(note... i've also read that internally sparse are implemented with xml engine, and I've had bad performance experience with it ... )
TIA

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

Thom A SSC Guru Points: 99126 More actions · Answer 1

This is difficult in envisage based on your limit sample. The 3rd option appear to be the normalised option however. The 1st one will be awful to work with (you'll likely end up with Dynamic SQL solutions just to get your data out), and the 2nd isn't normalised. I can't see why you think the normalised version is a pain to query; I'd suggest that the 2nd isn't great, and the 1st (well, you have my opinion). Why do you want to avoid normalised data?

Thom~

Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk

carlo.pagliei 63874 Old Hand Points: 327 More actions · Answer 2

Thom
thx for your reply.

The third is a pain because in the join condition you always need to put two clauses and if for mistake you forget to put the obj type you have unpredictable results

however just to argue a little more suppose I have two tables files and file-segments, i want to tag them
a struct may be
tag[tad-id, name, color, ...]
1

file[ file-id] files-tag[ file-id, tag-id]
segment[ file-id, segment-id] segment-tag[ segment-id, tag-id]

2

file[ file-id]
segment[ file-id, segment-id]
tagging[ tag-id, file-id, segment-id]

3

files[ file-id]
segment[ file-id, segment-id]
tagging[ tag-id, table-id, foreign-key]
table[ table-id, name]

Querying in case 1 or 2 is practically the same, in 1st case you change the join table, in the 2nd you change the columns
in case 3 you always have to join the foreign-key and then add an additional clause for narrowing results, this for me is very error prone beause if you forget the clause, you have unpredictable result...
hope is more clear.

Thom A SSC Guru Points: 99126 More actions · Answer 3

carlo.pagliei 63874 - Monday, November 26, 2018 2:20 AM
Thom
thx for your reply.
The third is a pain because in the join condition you always need to put two clauses and if for mistake you forget to put the obj type you have unpredictable results

This doesn't seem like a pain, this seems more like a user error. Normalised data is definitely the way to go. "I might forget to include both clauses" is a really bad excuse for bad data design.

Without knowing the broader definitions of your data I can't show samples, but if you go with option 1 you are going to (very likely) forever normalising your data, or using an expression like the following to get the value of object:
CHOOSE(@intVar, Obj_id1, Obj_id2, Obj_id3, ... Obj_id999).
That's not very scalable, and as soon as you add an object id 1000 you need to update ALL of your references.

For the other, well you can't do syntax like:
FROM OtherTable OT JOIN OT.ObjID O ON OT.JobID = O.JobID
Object names must be literals; this means you'd have to go down the dynamic SQL route for EVERY query. Yuck.

The easy option is to normalise your data, like i said. Then your clause is just:
FROM OtherTable OT JOIN ObjectTable O ON OT.JobID = O.JobID AND {O.ObjID = 2 / OT.ObjID = OT.ObjID}
That's it. It scales, and it's not dynamic.

Thom~

Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk

jcelko212 32090 SSCrazy Eights Points: 9294 More actions · Answer 4

>> I have a [JOB] table that may reference multiple objects (by integer ID) each stored in a different table <<

singular table name? This tells us you have only one job. In SQL, we don’t reference objects; we reference other tables. And since identifiers are on nominal scale by definition, we don’t use integers for them. What’s a square root of your credit card number? Integers are for ordinals, quantities and magnitudes and the relational model. Because are on a nominal scale, this is not an option . Finally, where is the DDL for this table?

From your narrative, it looks like you’ve what’s called an EAV design. And you need to get rid of it

>> there are several ways to map this scenario <<
1. a relation table for each object table .
REL1: [job_id, obj_id1],
REL2: [job_id, obj_id2],
...

>> 2. just one relation table with multiple (sparse) obj ids
RELS [job_id, obj_id1, obj_id2, …] <<

so we got a magic relation table that could be automobiles, squids or Lady Gaga. This violates one of the fundamental principles of logic called the law of identity,

>> 3. one relation table with obj type and id
RELS [job_id, obj_type, obj_id] <<

in SQL, a relation is shown as a table with the name of the relation and REFERENCES clauses to the of the entity tables involved in the relationship. A <something>_type is what we call “attribute property” in RDBMS. It is not generic. What you’re doing is putting metadata in the base tables!

>> over the years I've used all solutions, ...<<

Over the years I’ve been correcting these horrible design errors. What you’re asking is like “what’s the best kind of rocks to pound screws and defined furniture?” When the right answer is “don’t do that!” instead. Google it.

https://www.red-gate.com/simple-talk/sql/t-sql.../avoiding-the-eav-of-destruction/
https://www.postgresql.org/message-id/4BE45F66.5070402@magwerks.com
karwin.blogspot.com/2009/05/eav-fail.html

Please post DDL and follow ANSI/ISO standards when asking for help.

carlo.pagliei 63874 Old Hand Points: 327 More actions · Answer 5

Thanks for yout thoughts Joe
Even if I have an EAV design part in the DB (here the post:EAValue is evil ?) in this case is not an EAV design, is just a job queue with his job definition that may refer several targets. I don't need to do complex queries over that since there is an automated process that will fetch the queue ad execute the jobs.
Anyway this is a SQL with table defs
As you can see there are several entities (files, segments, projects) with a list of jobs to do on that entities.
Job may apply or not on several entities, for example a job "Render To New File" may be applied to any of those object...
So I was asking myself which is the most clean and less-issue-prone design for that scenario.... sparse columns could be a solution... just asking your thoughts

Hope is more clear.

-- represents a classic file create table [File] ( [file_id] int primary key identity(1, 1), [path] nvarchar(255), bigint, [duration] int )

-- represents a segment within a file create table [Segment] ( [segment_id] int primary key identity(1, 1), [file_id] int, [mark_in] int, [duration] int )

-- represents a collection of segments create table [Project] ( [project_id] int primary key identity(1, 1), [segment_id] int, [ordinal] int )

-- job definition: define a series of automated or manual operations -- to be applied on file, segment or project -- typical example may be: -- copy file, move file, extract segment to new file, render project files, -- bundle project into a zip and ship by email, and may others... -- a single job def may apply to different kind of targets create table [JobDefinition] ( [jobDefinition_id] int, [name] nvarchar(255), [description] nvarchar(max), [config] xml )

-- now... each operation must be applied to a different target

-- version 1 create table [ActiveJob_version1_onFile] ( [job_id] int primary key identity(1, 1), [jobDefinition_id] int, [runtimeStatus] xml, [file_id] int, ) create table [ActiveJob_version1_onSegment] ( [job_id] int primary key identity(1, 1), [jobDefinition_id] int, [runtimeStatus] xml, [segment_id] int, ) create table [ActiveJob_version1_onProject] ( [job_id] int primary key identity(1, 1), [jobDefinition_id] int, [runtimeStatus] xml, [project_id] int )

-- version 2 create table [ActiveJob_version2] ( [job_id] int primary key identity(1, 1), [jobDefinition_id] int, [runtimeStatus] xml, [file_id] int, [segment_id] int, [project_id] int )

-- version 3 create table [ActiveJob_version3] ( [job_id] int primary key identity(1, 1), [jobDefinition_id] int, [runtimeStatus] xml, [target_id] int, [target_obj] int )

carlo.pagliei 63874 Old Hand Points: 327 More actions · Answer 6

Thom A - Monday, November 26, 2018 2:38 AM
carlo.pagliei 63874 - Monday, November 26, 2018 2:20 AM
Thom
thx for your reply.
The third is a pain because in the join condition you always need to put two clauses and if for mistake you forget to put the obj type you have unpredictable results
This doesn't seem like a pain, this seems more like a user error. Normalised data is definitely the way to go. "I might forget to include both clauses" is a really bad excuse for bad data design.
Without knowing the broader definitions of your data I can't show samples, but if you go with option 1 you are going to (very likely) forever normalising your data, or using an expression like the following to get the value of object:
CHOOSE(@intVar, Obj_id1, Obj_id2, Obj_id3, ... Obj_id999).
That's not very scalable, and as soon as you add an object id 1000 you need to update ALL of your references.
For the other, well you can't do syntax like:
FROM OtherTable OT JOIN OT.ObjID O ON OT.JobID = O.JobID
Object names must be literals; this means you'd have to go down the dynamic SQL route for EVERY query. Yuck.
The easy option is to normalise your data, like i said. Then your clause is just:
FROM OtherTable OT JOIN ObjectTable O ON OT.JobID = O.JobID AND {O.ObjID = 2 / OT.ObjID = OT.ObjID}
That's it. It scales, and it's not dynamic.

You'right... probably is the best among the three ! thx

GaryV Hall of Fame Points: 3307 More actions · Answer 7

Seems to me that this is a classic case of normalization.
You have a table containing Jobs and some information about the jobs.
You have another table containing Objects and some characteristics of the objects.
A Job can reference one (zero?) or more Objects. So you'd set up a relationship table of "Job to Object" that contains the JobID and the associated ObjectID's - and maybe some other information relating to the combo.
I'm not sure where the Object Type comes in. Do all the objects have some standardized list of types? And can an Object have more than one Object Type? If there's only one type per object, you'd have an ObjectType table (like a lookup table) and put the ID from that table in a column in the object table. If the object can have more than one type, you'd have aonther relationship table "Object to ObjectType" that would contain the ID's of the Object and the ObjectType.

Querying is simply a matter of joining the tables.

I've inherited tables that have multiple columns of data (like your option 2) and they are truly a pain to work with, particularly if you need to find if a row has a particular item in it. You need a construct like
WHERE obj1 = @x OR obj2 = @x OR ... objN = @x

carlo.pagliei 63874 Old Hand Points: 327 More actions · Answer 8

gvoshol 73146 - Wednesday, November 28, 2018 6:07 AM
I'm not sure where the Object Type comes in.

The "object type" is the core of the problem: a job has a FK that may reference different entities (tables)... just to make an example: if the job says "compress to zip", then the target may be a file, a folder, a collection of unrelated documents and so on... so I need to store both the id and the entity referenced by.
I was trying to understand which of the three proposed approaches is the best...probably using F-Key and F-Entity in the job table is the better one.

whenriksen SSCarpal Tunnel Points: 4752 More actions · Answer 9

You need to switch your perspective on how you approach the data. You seems to be looking at it from the "many" side of a relationship. Conceptually, there are no collections in a RDBMS. The relationships for an individual record is defined instead. You mention that a project is a collection of segments. That is incorrect. A segment belongs to a project. If I wanted to find all the segments for a project, I would search for segments belonging to the project, since the project definition should not include which segments belong to it.
SELECT * FROM SEGMENT WHERE PROJECT_ID = 1

Another issue you'll need to address is 3NF normalization. You describe a relationships table as RELS [job_id, obj_type, obj_id]
I assume this is to address a M:M relationship between job & object. However, what is the purpose of obj_type in this table. If that describes the object, shouldn't it be in the object table? If it describes the relationship between job and obj, it should be named differently.

When working with a many to many relationship, you'll need to split the relationship with an intermediate entity using 1:1 or 1:M relationships.
Using the typical User & Group scenario as an example and keeping in mind that a relationship is always described from the perspective of a single record in the starting object:
User M:M Group
In this notation, the individual record is implied, but is casually stated when describing it.
A user can be in many groups.
A group can contain many users.
To break this apart you create an intermediate relationship table, named UserGroup
A user can be in many usergroups, but a usergroup can only reference one user. User 1:M UserGroup
A group can contain many usergroups, but a usergroup can onlly reference one group. Group 1:M UserGroup

Now for your specific model:
You want a job to be able to perform many actions on 0:M projects or 0:M segments or 0:M files.
I don't know the details, but based on what you've provided, I would create the model.
File 1:M Segment; Segment.File_ID = File.ID
Segment 1:M Project; Project.Segment_ID = Segment.ID
Project Type 1:M Project; Project.ProjectType_ID = ProjectType.ID
Job 1:M JobProject; JobProject.Job_ID = Job.ID
Project 1:M JobProject JobProject.Project_ID = Project.ID

I hope this helps.

Wes
(A solid design is always preferable to a creative workaround)