Identity field for batch migrating data

  • We have a new .net application that using entity framework.

    In the database, there are going to be many tables like hearings, conference, workshop etc that needs a column of note.

    So we plan to create below tables :

    Entity : EntityID -- only column and it is identity column -- like a global identifier

    Note: NoteID, Notetype,.. --- noteId is also identity column

    EntityNote: NoteID, EntityID -- to associate Note ID with each record in talbes.

    Hearing: hearingID, ...EntityID

    Conference: conferenceid, ... entityID.

    workshop: workshopid,... EntityID

    So basically generate an entityID first in a table that only holds the ID, when insert record into other tables, also insert this entityID.

    Then when creating note, it insert into note table first , get noteID (identity column) then insert Into table EntityNote.

    It seems when these records created, since it is from UI, it each time to create one record, entity framework seems handling well to insert default value of identity and get identity, and to create another record.

    But my questions is for migrating existing data, it will not get one record by one, but in batch, how can we each time to generate an identity, then use it in other table by batch.

    I tried to use output clause with merge that can use fields from multiple tables, but it has limitations , not allowing table that has PK or FK.

    I also think we may have to use cursor for each table, again it will have to create one cursor for each of the tables that needs the entityID, and create all columns a variables, many codes needs to be written,

    Any other good options?

  • Hi

    use MERGE with an OUTPUT clause into a temp table. MERGE will give you access to both the INSERTED and ORIGINAL ID's. You can force the condition to false so it will always perform an INSERT.

  • sqlfriends (2/8/2016)


    We have a new .net application that using entity framework.

    In the database, there are going to be many tables like hearings, conference, workshop etc that needs a column of note.

    So we plan to create below tables :

    Why?

    Unless I miss some of your functional requirements, the best solution is to simply add a Note column to each table. If you are worried about confusing column names, prefix them with the table name: HearingNote, ConferenceNote, etc.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Because a record in the table may have one or more notes, that added in different dates. It may also have different noteType for the notes.

    Thanks

  • Okay. In your first post you write that the tables need "a column of note". Apparently that was incorrect, each of them needs the ability to add mutliple notes.

    In that case, the obvious design would be to have a separate table for every table that needs notes - so you get a HearingNotes table, a ConferenceNotes table, etc.

    You also mention different note types, so those XxxNotes tables would each get a NoteType column.

    The primary key of those tables would be composite, consisting of the foreign key to the hearing, conference, etc, plus a second column to identify a single note within the set. If there can be only one note of each type, the NoteType column would be the second column. Otherwise it could be a timestamp (if you record one), or just a simple NoteNumber.

    Please do not think that having multiple tables with a very simiiar layout is a design flaw. As long as different things are represented in the tables, they should be different tables. However, it does all depend on how the tables will be used - in other words, on how different the things in the tables are. If you expect to have a lot of queries over all types of notes, then splitting them over multiple tables may not be optimal. And if you can have a single note that should link to multiple entities (e.g. a note that relates to both a hearing and a conference), then you should definitely use a single table.

    The single-table design is slightly more complicated. Here the notes should have a primary key of their own, because you are not guaranteed to always have a reference to a hearing, a conference, etc. The table should then have multiple independent foreign keys to the various possible "parent" tables. So you have a column for a reference to a hearing, another column for a reference to a conference, yet another column referencing a workshop, etc. Each of those columns is nullable. Add a CHECK constraint to ensure that at least one must always be filled with a non-null value (otherwise you'd mhave an orphaned note). Depending on your business rules, add other check constraints to prevent illegal combinations - e.g. if a single note cannot be attached to both a conference and a workshop, then add a CHECK constraint that prevents both these columns being non-null.

    (And if a single note can even be attached to multiple instances of the same entity type, e.g. a single note for two different workshops, the design becomes even more complex. But let's not get there, because I think you don't need to make this so complex).


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • sqlfriends (2/8/2016)


    So we plan to create below tables :

    Entity : EntityID -- only column and it is identity column -- like a global identifier

    That would be a great source of contention, locking and overall performance problems.

    Every insert into every table will be fighting for exclusive possession of that single data page at the inserting end of Entity table.

    _____________
    Code for TallyGenerator

  • Thanks all for the good tips.

    Let me take a close look and may get back to you.

    We originally did go the route that creates a note table, with NoteID and notetype, and add several FK, like hearingID, conferenceID, workshopID, etc. to the table and can be nullable.

    But feel like this kind of design is not normalized.

    So we goes to use the entityID table route.

    It looks now the best option could be using to create a noteTable for each business table that needs note.

    Let me think about more, and may come back with more questions.

    Thanks much!

  • A separate notes table for each table that can have zero to many notes for each row in the parent table is the correct design. It's pretty-much exactly what Hugo said. It's a good and proven design. The only column that'll be duplicated in the notes table is the foreign key that points to the parent table's primary key. You'll only have note rows where you have notes present. Do yourself a favor and create a nonclustered index on that foreign key if you ever plan to do any deletes from the parent table to prevent a table scan of the parent.

    If it were me, I wouldn't use a GUID for a clustered index (EntityID). Like Sergiy pointed out, you're going to run into problems. You'll also suffer from page splits as rows get added with non-sequential values. If you're going to be using an ORM, you'll probably have enough problems already without adding more.

  • Thank you Hugo, Ed Wagner and Sergiy for your posts about using the entityid.

    I kind of suspect at the very beginning the benefits of using entityID , esp, table lock and concurrency. Now I am more clear about it.

    Also at beginning, I thought our .net developers who comes out this idea may think that is easier from entity framework point of view, but I just feel not right, but cannot say out the reason clearly.

  • sqlfriends (2/9/2016)


    It looks now the best option could be using to create a noteTable for each business table that needs note.

    Let me think about more, and may come back with more questions.

    You can still have a single Note table, but a separate "brigde" table for every entity with a note:

    HearingID

    NoteTypeID

    NoteID

    This way you may have many notes of different types allocated to the same entity, and you may associate the same note with many different entities without a necessity to duplicate the text of the note itself.

    _____________
    Code for TallyGenerator

  • so is it going to be something like this?

    Note table: Noteid, NoteDetail, NoteDate, CreateDate, createdby

    NoteBridgeTable:

    columns: NoteID, NoteTypeID, Objectid (cannot find a good name for it)

    Data:

    Noteid, notetypid, hearingid

    Noteid, notetypid, conferenceid,

    Noteid, notetypeid, workshopid

    Noteid, notetypeid, Interventionid

    Should NoteType in Note table or in NoteBridge table?

    Thanks,

  • Also if I create a Notetable for each Object, what name should be called?

    For example:

    I have conference table,

    Then I created a ConferenceNote table. It will be NoteID, notetype, ConferenceID,

    In the Workshop table, they will be Noteid, notetype, workshopid,

    Or should I give a generic common name for both workshopid, and confenceid column, something like objectID, or entityID?

    Thanks

  • What you call your tables is ultimately up to you. You're the one who's going to be using and supporting it.

    There are several approaches to design something like this. If you go with one notes table for each parent, I would create it something like this, knowing what I know about your requirements.

    CREATE TABLE dbo.ConferenceNotes (

    ID Integer not null identity(1, 1),

    constraint ConferenceNotes_PK primary key (ID),

    ConferenceID integer not null,

    constraint ConferenceNotes_Conference_FK

    foreign key (ConferenceID)

    references dbo.Conference(ID),

    NoteTypeID integer not null,

    constraint ConferenceNotes_NoteTypes_FK

    foreign key (NoteTypeID)

    references dbo.NoteTypes(ID),

    Notes varchar(8000) not null, --or whatever data type is appropriate

    EntryDate datetime not null default getdate(),

    UpdateDate datetime);

    --if you can delete from the conference table, the foreign key will cause a table scan, so create a nonclustered index to cover it

    CREATE NONCLUSTERED INDEX ConferenceNotes_Conference_FK on dbo.ConferenceNotes(ConferenceID);

    If, on the other hand, you want to go with a single notes table, you would create a notes table and then a table with multiple NULLable foreign keys that references the parent table (1 for each table) and then a single column to refer to the notes table.

    Based on what you've told us, I would probably go with a notes table for each parent table. I know at least one ERP that uses a single table for all text and a single foreign key that can point to one of many different tables and the table referred to in a separate column. I would not recommend that approach to anyone. I hated working with the ERP and, while this was far from the only reason, it's still bad. I like declarative referential integrity and I don't think the application should enforce them.

  • sqlfriends (2/9/2016)


    I have conference table,

    Then I created a ConferenceNote table. It will be NoteID, notetype, ConferenceID,

    In the Workshop table, they will be Noteid, notetype, workshopid,

    Yep. That's it.

    Simple and flexible solution.

    No NULLable columns, no empty spaces, FK's enforce strong data integrity.

    And you include notes into queries only when you need to display them.

    Just make sure you choose correct clustered index for each of those tables.

    It depends of the way you'll be querying the notes.

    Or should I give a generic common name for both workshopid, and confenceid column, something like objectID, or entityID?

    You can do that.

    But remember - pretty soon other developers will catch you in a dark corner with a strong intention to beat the c..p out of you. :w00t:

    _____________
    Code for TallyGenerator

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply