Prevent duplicates in a table

  • I am entering 'salaries' information into a sql server table called 'salaries'.
    Fields are :
    id (pk,identity),c_id (int),month,year,salary.

    I use a stored procedure to insert this like :

    insert into salaries (c_id,month,year,salary)
    select c_id,month,year,salary from TEMP

    Now I am trying to avoid duplicates from being inserted.
    There can be no duplicates for c_id,month,year.
    All 3 conditions at the same time.

    c_id can be inserted as many times as you like but not with the same month and year data.
    Month and year data are also not unique as I am inserting data for all company for that month.

    How can I do this ? I used to know it but I forgot. I dont remember if I used IF EXISTS or if I
    used some index. please help.

  • Just add a unique constraint on c_id, month and year.  If you've got identity columns throughout your database then you'll want to look at putting a unique constraint on the natural key in all tables.

    John

  • If those are the only four columns, why aren't the three the primary key?  On the face, that would seem to meet all the normalization rules.

  • Senchi - Monday, August 20, 2018 8:53 AM

    I am entering 'salaries' information into a sql server table called 'salaries'.
    Fields are :
    id (pk,identity),c_id (int),month,year,salary.

    I use a stored procedure to insert this like :

    insert into salaries (c_id,month,year,salary)
    select c_id,month,year,salary from TEMP

    Now I am trying to avoid duplicates from being inserted.
    There can be no duplicates for c_id,month,year.
    All 3 conditions at the same time.

    c_id can be inserted as many times as you like but not with the same month and year data.
    Month and year data are also not unique as I am inserting data for all company for that month.

    How can I do this ? I used to know it but I forgot. I dont remember if I used IF EXISTS or if I
    used some index. please help.

    You'll actually learn to hate the fact that you have separate month and year columns someday in the not-so-distant future for a whole lot of reasons.

    As for a PK, the month and year thing will make that difficult as well because there is no guarantee that people won't be paid more than once per month.  And what the heck does "c_ID" mean if you take it out of context?  If you must have an ID column, use the form of TableName_ID or TableNameID.

    I won't get into what I think of pluralized tables names. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Monday, August 20, 2018 9:30 AM

    Senchi - Monday, August 20, 2018 8:53 AM

    I am entering 'salaries' information into a sql server table called 'salaries'.
    Fields are :
    id (pk,identity),c_id (int),month,year,salary.

    I use a stored procedure to insert this like :

    insert into salaries (c_id,month,year,salary)
    select c_id,month,year,salary from TEMP

    Now I am trying to avoid duplicates from being inserted.
    There can be no duplicates for c_id,month,year.
    All 3 conditions at the same time.

    c_id can be inserted as many times as you like but not with the same month and year data.
    Month and year data are also not unique as I am inserting data for all company for that month.

    How can I do this ? I used to know it but I forgot. I dont remember if I used IF EXISTS or if I
    used some index. please help.

    You'll actually learn to hate the fact that you have separate month and year columns someday in the not-so-distant future for a whole lot of reasons.

    As for a PK, the month and year thing will make that difficult as well because there is no guarantee that people won't be paid more than once per month.  And what the heck does "c_ID" mean if you take it out of context?  If you must have an ID column, use the form of TableName_ID or TableNameID.

    I won't get into what I think of pluralized tables names. 😉

    He said there can be no duplicates for id, year, month.  This may not have anything to do with actual pay but what they should be paid in a month.  While I'm not keen on the separation of dates, I wouldn't as a knee jerk reaction see it as wrong.  Otherwise multiple dates could be entered for a particular month, which again is apparently undesirable.  The separation is probably best and the dates are easily constructed when necessary with the DATEFROMPARTS function.

    While I agree with you on the table plurals, that's an internal preference.

  • but if I add unique index to a  field am I not going to stop insertion of other employees  ?
    Its 3 conditions that have to be applied in order to  abort operatin.

  • RonKyle - Monday, August 20, 2018 9:37 AM

    Jeff Moden - Monday, August 20, 2018 9:30 AM

    Senchi - Monday, August 20, 2018 8:53 AM

    I am entering 'salaries' information into a sql server table called 'salaries'.
    Fields are :
    id (pk,identity),c_id (int),month,year,salary.

    I use a stored procedure to insert this like :

    insert into salaries (c_id,month,year,salary)
    select c_id,month,year,salary from TEMP

    Now I am trying to avoid duplicates from being inserted.
    There can be no duplicates for c_id,month,year.
    All 3 conditions at the same time.

    c_id can be inserted as many times as you like but not with the same month and year data.
    Month and year data are also not unique as I am inserting data for all company for that month.

    How can I do this ? I used to know it but I forgot. I dont remember if I used IF EXISTS or if I
    used some index. please help.

    You'll actually learn to hate the fact that you have separate month and year columns someday in the not-so-distant future for a whole lot of reasons.

    As for a PK, the month and year thing will make that difficult as well because there is no guarantee that people won't be paid more than once per month.  And what the heck does "c_ID" mean if you take it out of context?  If you must have an ID column, use the form of TableName_ID or TableNameID.

    I won't get into what I think of pluralized tables names. 😉

    He said there can be no duplicates for id, year, month.  This may not have anything to do with actual pay but what they should be paid in a month.  While I'm not keen on the separation of dates, I wouldn't as a knee jerk reaction see it as wrong.  Otherwise multiple dates could be entered for a particular month, which again is apparently undesirable.  The separation is probably best and the dates are easily constructed when necessary with the DATEFROMPARTS function.

    While I agree with you on the table plurals, that's an internal preference.

    The separation still sucks. 😉  If someone wants uniqueness by month,  create a persisted computed column that resolves to the number of months since Jan 1900 and use that as a part of the key.  And there's no knee jerking about this.  I've seen people get into trouble even with reporting tables by separating month and year and other date parts.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • The separation still sucks. Wink If someone wants uniqueness by month, create a persisted computed column that resolves to the number of months since Jan 1900 and use that as a part of the key. And there's no knee jerking about this. I've seen people get into trouble even with reporting tables by separating month and year and other date parts

    I'm not a fan of artificially generated keys if there's a business key that will work and not be "too many" columns.  Don't ask me for an exact definition of that, but I usually draw the line at four if it can be a parent.  Year 2018 Month 3 is more human readable that 148 or whatever month that would translate as in an OLTP environment.  I do it that way in an OLAP environment, but that is a different matter.

  • Jeff Moden - Monday, August 20, 2018 2:13 PM

    RonKyle - Monday, August 20, 2018 9:37 AM

    Jeff Moden - Monday, August 20, 2018 9:30 AM

    Senchi - Monday, August 20, 2018 8:53 AM

    I am entering 'salaries' information into a sql server table called 'salaries'.
    Fields are :
    id (pk,identity),c_id (int),month,year,salary.

    I use a stored procedure to insert this like :

    insert into salaries (c_id,month,year,salary)
    select c_id,month,year,salary from TEMP

    Now I am trying to avoid duplicates from being inserted.
    There can be no duplicates for c_id,month,year.
    All 3 conditions at the same time.

    c_id can be inserted as many times as you like but not with the same month and year data.
    Month and year data are also not unique as I am inserting data for all company for that month.

    How can I do this ? I used to know it but I forgot. I dont remember if I used IF EXISTS or if I
    used some index. please help.

    You'll actually learn to hate the fact that you have separate month and year columns someday in the not-so-distant future for a whole lot of reasons.

    As for a PK, the month and year thing will make that difficult as well because there is no guarantee that people won't be paid more than once per month.  And what the heck does "c_ID" mean if you take it out of context?  If you must have an ID column, use the form of TableName_ID or TableNameID.

    I won't get into what I think of pluralized tables names. 😉

    He said there can be no duplicates for id, year, month.  This may not have anything to do with actual pay but what they should be paid in a month.  While I'm not keen on the separation of dates, I wouldn't as a knee jerk reaction see it as wrong.  Otherwise multiple dates could be entered for a particular month, which again is apparently undesirable.  The separation is probably best and the dates are easily constructed when necessary with the DATEFROMPARTS function.

    While I agree with you on the table plurals, that's an internal preference.

    The separation still sucks. 😉  If someone wants uniqueness by month,  create a persisted computed column that resolves to the number of months since Jan 1900 and use that as a part of the key.  And there's no knee jerking about this.  I've seen people get into trouble even with reporting tables by separating month and year and other date parts.

    I use the first (or last) day of the month.  This has the advantage that it can easily be formatted to show only the year and month.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Though I appreciate your opinions can somebody actually tell me on how to accomplish this ?

  • Senchi - Monday, August 20, 2018 3:18 PM

    Though I appreciate your opinions can somebody actually tell me on how to accomplish this ?

    They already did.

    John Mitchell-245523 - Monday, August 20, 2018 9:10 AM

    Just add a unique constraint on c_id, month and year.  If you've got identity columns throughout your database then you'll want to look at putting a unique constraint on the natural key in all tables.

    John

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • If you reread everyone's opinion you will find the answer you are looking for.

  • RonKyle - Monday, August 20, 2018 2:21 PM

    The separation still sucks. Wink If someone wants uniqueness by month, create a persisted computed column that resolves to the number of months since Jan 1900 and use that as a part of the key. And there's no knee jerking about this. I've seen people get into trouble even with reporting tables by separating month and year and other date parts

    I'm not a fan of artificially generated keys if there's a business key that will work and not be "too many" columns.  Don't ask me for an exact definition of that, but I usually draw the line at four if it can be a parent.  Year 2018 Month 3 is more human readable that 148 or whatever month that would translate as in an OLTP environment.  I do it that way in an OLAP environment, but that is a different matter.

    I'd tend to agree, but I'd still follow Jeff's basic design. Have one date column with the combined day/month/year and then persisted computed columns (with the necessary unique constraint)  to break out the year and month part.

  • ohhh I got it. Thanks to the added constraint I found out I already had duplicates in the table.
    Much obliged. Thank you.

  • ALTER TABLE salaries ADD CONSTRAINT UNsalaries UNIQUE(c_id,month,year);

Viewing 15 posts - 1 through 15 (of 46 total)

You must be logged in to reply to this topic. Login to reply