Prevent duplicates in a table

  • jcelko212 32090 - Tuesday, August 21, 2018 6:01 PM

    Lynn Pettis - Tuesday, August 21, 2018 11:06 AM

    jcelko212 32090 - Tuesday, August 21, 2018 11:00 AM

    Senchi - Monday, August 20, 2018 8:53 AM

    Mr. Celko, ENOUGH ABOUT MySQL!  If you love that product so much go find a forum for it and live there instead of here.

    That is one of my standard "cut and paste"  replies to a common problem in SQL Server (lack of the ANSI/ISO standard interval temporal datatypes). It's not particularly a plug for the MySQL product. But as full disclosure, I will tell you that I did work for them many years ago doing videos on basic data design (not product specific training videos). Frankly, I find the MySQL dialect to be too far away from standards for my taste.

    But there proposal for integrating certain temporal interval data types into ISO 8601 standards to be a neat solution, with many advantages and it seems to be catching on the standards community.

    Can I assume you have a better solution that they should be using for this problem? I've never seen you posted anywhere that I can remember.

    I really don't give a rats tails butt. I was not responding to the OP but to your nonsense reply talking about MySQL in a Microsoft SQL Server forum.  And I have probably helped a lot more people on SSC than you have.

  • jcelko212 32090 - Tuesday, August 21, 2018 6:01 PM

    Lynn Pettis - Tuesday, August 21, 2018 11:06 AM

    jcelko212 32090 - Tuesday, August 21, 2018 11:00 AM

    Senchi - Monday, August 20, 2018 8:53 AM

    Mr. Celko, ENOUGH ABOUT MySQL!  If you love that product so much go find a forum for it and live there instead of here.

    That is one of my standard "cut and paste"  replies to a common problem in SQL Server (lack of the ANSI/ISO standard interval temporal datatypes). It's not particularly a plug for the MySQL product. But as full disclosure, I will tell you that I did work for them many years ago doing videos on basic data design (not product specific training videos). Frankly, I find the MySQL dialect to be too far away from standards for my taste.

    But there proposal for integrating certain temporal interval data types into ISO 8601 standards to be a neat solution, with many advantages and it seems to be catching on the standards community.

    Can I assume you have a better solution that they should be using for this problem? I've never seen you posted anywhere that I can remember.

    And that is part of the problem.  You have a bunch of pat answers that you cut and paste into your replies that are only superficially relevant to the topic.  If the topic concerns representing dates, you cut and paste your reply that includes MySQL using YYYY-MM-00 even though that is completely irrelevant because SQL Server doesn't implement it.  If the topic concerns using BIT to represent True/False values, you cut and paste your response saying that this is deprecated by the standard (ignoring the part of the standard that says to replace it with BOOLEAN which, again, SQL Server does not support).  Your responses come off as a really bad AI that wouldn't even come close to passing the Turing Test.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen - Wednesday, August 22, 2018 8:49 AM

    jcelko212 32090 - Tuesday, August 21, 2018 6:01 PM

    Lynn Pettis - Tuesday, August 21, 2018 11:06 AM

    jcelko212 32090 - Tuesday, August 21, 2018 11:00 AM

    Senchi - Monday, August 20, 2018 8:53 AM

    >> ... You have a bunch of pat answers that you cut and paste into your replies .. <<

    Yes, and posters have a bunch of pat questions that the have been using for the past 30+ years 🙂

    >> .. If the topic concerns representing dates, you cut and paste your reply that includes MySQL using YYYY-MM-00 even though that is completely irrelevant because SQL Server doesn't implement it. <<

    The use of zeros in the month and day fields of dates is a strong candidate for standardization. Now I'm sure you love having your customers have no idea what's going to happen to them in the future. Job secure programming and consulting?

    I also posted how to use the

    OUTER JOIN when we knew that it was coming in the near future to replace the old *= Sybase syntax. I got a lot of extra bonus points from my customers for putting the ANSI/ISO Standard infixed syntax in a comment of some code I wrote for them. When the time came to upgrade, all they had to do was uncomment my code, test it and remove their prior code.

    One of my customers that and anohter bonus. They were also moving not just to an upgrade on SQL Server, but also to DB2. Another department the same company had to hire separate teams for the two products, but my people had one code base that moved over in about a week.

    This is called "future proofing" your code. It's a trick we learned back when the US government required us to have FIPS flaggers because so many people back in the 1950s and 1960s had your attitude ("if my current release of my favorite product doesn't have this feature, then we must never discuss it").

    >> .. Using BIT to represent True/False values, you cut and paste your response saying that this is deprecated by the standard (ignoring the part of the standard that says to replace it with BOOLEAN which, again, SQL Server does not support). <<

    The BIT data type in SQL Server changed from {0, 1} to {0, 1, NULL}; which particular implementation do you use with your customers? How do you port to other SQL platforms? Do you know why this change was made (it has to do with a fundamental design principle we put in the ANSI/ISO SQL Standards)? But I don't like to use assembly language style flags in SQL. I normalize and design my schemas so that I discover "the state of being"  in my data with predicates instead of flags. I believe that 80-90% of the work in SQL is in the DDL.

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • Senchi - Monday, August 20, 2018 8:53 AM

    I am entering 'salaries' information into a sql server table called 'salaries'.
    Fields are :
    id (pk,identity),c_id (int),month,year,salary.

    I use a stored procedure to insert this like :

    insert into salaries (c_id,month,year,salary)
    select c_id,month,year,salary from TEMP

    Now I am trying to avoid duplicates from being inserted.
    There can be no duplicates for c_id,month,year.
    All 3 conditions at the same time.

    c_id can be inserted as many times as you like but not with the same month and year data.
    Month and year data are also not unique as I am inserting data for all company for that month.

    How can I do this ? I used to know it but I forgot. I dont remember if I used IF EXISTS or if I
    used some index. please help.

    insert into salaries (c_id, month, year, salary)
    select c_id, month, year, salary
      from TEMP t
     where not exists(select *
                        from salaries s
                       where s.c_id=t.c_c_id
                         and s.month=t.month
                         and s.year=t.year)

    CREATE UNIQUE INDEX IX_SALARIES_1 ON salaries (c_id, month, year)

  • jcelko212 32090 - Friday, August 24, 2018 12:29 PM

    The use of zeros in the month and day fields of dates is a strong candidate for standardization. 

    I do hope not, it's such a horrible hack to avoid supporting the concept properly.

  • andycadley - Friday, August 24, 2018 11:03 PM

    jcelko212 32090 - Friday, August 24, 2018 12:29 PM

    The use of zeros in the month and day fields of dates is a strong candidate for standardization. 

    I do hope not, it's such a horrible hack to avoid supporting the concept properly.

    +1 Billion

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • andycadley - Friday, August 24, 2018 11:03 PM

    jcelko212 32090 - Friday, August 24, 2018 12:29 PM

    The use of zeros in the month and day fields of dates is a strong candidate for standardization. 

    I do hope not, it's such a horrible hack to avoid supporting the concept properly.

    In fairness, we do have the interval data type in ANSI/ISO standard SQL. It's based on the fields in the timestamp (year, month, day, hour, minute, seconds (with fractions)). Basically, you pick a beginning field and ending field and it includes all of the fields in between to make an interval. The pre-defined intervals are DATE (year, month, day) and TIME (hour, minute, seconds). While the ANSI/ISO standards don't require a particular number of decimal seconds, but the FIPS standards now require up to seven decimal places.

    Implementing this is going to be a major effort. SQL Server always been a little bit behind DB2, Oracle, Postgres and some others. However, they always tend toward the ANSI/ISO standards.

    I happen to like the zeros in the values, which has nothing to do with the creating interval data types. The advantages I see are that:
    1) they fit into the existing display format, yyyy-mm-dd
    2) these strings sort correctly
    3) this convention is well understood, thanks to the MySQL community.

    How would you "properly support" the concept of interval values?

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • jcelko212 32090 - Sunday, August 26, 2018 11:43 AM

    andycadley - Friday, August 24, 2018 11:03 PM

    jcelko212 32090 - Friday, August 24, 2018 12:29 PM

    The use of zeros in the month and day fields of dates is a strong candidate for standardization. 

    I do hope not, it's such a horrible hack to avoid supporting the concept properly.

    In fairness, we do have the interval data type in ANSI/ISO standard SQL. It's based on the fields in the timestamp (year, month, day, hour, minute, seconds (with fractions)). Basically, you pick a beginning field and ending field and it includes all of the fields in between to make an interval. The pre-defined intervals are DATE (year, month, day) and TIME (hour, minute, seconds). While the ANSI/ISO standards don't require a particular number of decimal seconds, but the FIPS standards now require up to seven decimal places.

    Implementing this is going to be a major effort. SQL Server always been a little bit behind DB2, Oracle, Postgres and some others. However, they always tend toward the ANSI/ISO standards.

    I happen to like the zeros in the values, which has nothing to do with the creating interval data types. The advantages I see are that:
    1) they fit into the existing display format, yyyy-mm-dd
    2) these strings sort correctly
    3) this convention is well understood, thanks to the MySQL community.

    How would you "properly support" the concept of interval values?

    Joe's response seems logical.  What is so "horrible" about this "hack"?  Isn't this at worst just a difference of opinion?

  • jcelko212 32090 - Sunday, August 26, 2018 11:43 AM

    andycadley - Friday, August 24, 2018 11:03 PM

    jcelko212 32090 - Friday, August 24, 2018 12:29 PM

    The use of zeros in the month and day fields of dates is a strong candidate for standardization. 

    I do hope not, it's such a horrible hack to avoid supporting the concept properly.

    In fairness, we do have the interval data type in ANSI/ISO standard SQL. It's based on the fields in the timestamp (year, month, day, hour, minute, seconds (with fractions)). Basically, you pick a beginning field and ending field and it includes all of the fields in between to make an interval. The pre-defined intervals are DATE (year, month, day) and TIME (hour, minute, seconds). While the ANSI/ISO standards don't require a particular number of decimal seconds, but the FIPS standards now require up to seven decimal places.

    Implementing this is going to be a major effort. SQL Server always been a little bit behind DB2, Oracle, Postgres and some others. However, they always tend toward the ANSI/ISO standards.

    I happen to like the zeros in the values, which has nothing to do with the creating interval data types. The advantages I see are that:
    1) they fit into the existing display format, yyyy-mm-dd
    2) these strings sort correctly
    3) this convention is well understood, thanks to the MySQL community.

    How would you "properly support" the concept of interval values?

    See, they are almost exactly the reason it shouldn't use 00.
    1) Sure they do, but a month isn't the same thing as a date. It's like arguing that an integer should be displayed as 1.0 because it "fits the existing display format" for decimal.
    2) So would "2018-03", which would be far more semantically correct. And when you start mixing them up with Dates (as MySQL does) it's entirely arguably they do NOT sort correctly at all - 2018-03-00 comes "before" 2018-03-02, but it's nonsensical to suggest the entirety of March comes before March the second, these things just aren't comparable and shouldn't be treated as if they are.
    3) That's a poor argument for supporting a bad representation. And MySQL itself tried to remove support for storing values like this in Date types because of all the myriad of problems it creates - but of course once Pandora's box is open it's hard to go back and they had to hastily revert the change because that now breaks heaps of MySQL code that's out in the wild. The net result of which is that if you're working in MySQL you actually can't assume that a Date column contains a Date or even that it is directly comparable with another Date column. I can't see any sane reason to want that to spread to other SQL implementations. 

    It's not the early 80s anymore. our systems have more than enough grunt to handle these things properly without resorting to dodgy cludges that might possibly let us save a bit of code here and there. Interval types should have their own semantics and display formats that actually make sense for what they are and don't fudge values in for the sake of it.

  • andycadley - Friday, August 24, 2018 11:03 PM

    jcelko212 32090 - Friday, August 24, 2018 12:29 PM

    The use of zeros in the month and day fields of dates is a strong candidate for standardization. 

    I do hope not, it's such a horrible hack to avoid supporting the concept properly.

    I can't see anything wrong with leading zeroes in year (when less that 1000), months (when less that 10), and day (when less that 10) numbers in text strings representing dates.  On the contrary, there's a lot wrong with not having them - the resulting strings would not sort in the same order as values typed as date (or datetime or datetime2), which would make sorting dates that don't use leading zeroes where appropriate into chronological order a real pain (and real performance cost).
    However, it's difficult to see why a day or month field should ever contain two zeroes.  What is it supposed to mean?  How does it help?   In a year field four zeroes is ok though, as 0 is a real year number.

    Tom

  • TomThomson - Monday, August 27, 2018 11:13 AM

    andycadley - Friday, August 24, 2018 11:03 PM

    jcelko212 32090 - Friday, August 24, 2018 12:29 PM

    The use of zeros in the month and day fields of dates is a strong candidate for standardization. 

    I do hope not, it's such a horrible hack to avoid supporting the concept properly.

    I can't see anything wrong with leading zeroes in year (when less that 1000), months (when less that 10), and day (when less that 10) numbers in text strings representing dates.  On the contrary, there's a lot wrong with not having them - the resulting strings would not sort in the same order as values typed as date (or datetime or datetime2), which would make sorting dates that don't use leading zeroes where appropriate into chronological order a real pain (and real performance cost).
    However, it's difficult to see why a day or month field should ever contain two zeroes.  What is it supposed to mean?  How does it help?   In a year field four zeroes is ok though, as 0 is a real year number.

    Actually, there is no year 0.  The year before 1 CE is 1 BCE.  This is the reason that the current century started in 2001 rather than the common misconception that it started in 2000, and certainly Y2K didn't help that misconception.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • TomThomson - Monday, August 27, 2018 11:13 AM

    andycadley - Friday, August 24, 2018 11:03 PM

    jcelko212 32090 - Friday, August 24, 2018 12:29 PM

    However, it's difficult to see why a day or month field should ever contain two zeroes.  What is it supposed to mean?  How does it help?   In a year field four zeroes is ok though, as 0 is a real year number.

    Fixed length encoding schemes have a huge advantage in laying out printed forms and screens. The regular expression for validating them is much easier than varying length encodings. And as you have pointed out, sorting is much easier. But I think another advantage is the shape of this display format tells the reader that he's looking at and encoding of a date. He doesn't have to guess.

    I was looking at some old books in my collection on the history of mathematics. There's a wonderful quote somewhere, translated from the Latin about how "the cipher pretends to be a number, but obviously it is not" and the diatribe about how these newfangled Arab numbers are confusing, not well understood and we should never leave God's own's Roman numerals.

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • If you want it as an output format, fine but 99% of non programmers won't have a clue what it means. As an input format, for the same reason, it is terrible.

    And worrying that it won't display in a fixed width column with other dates, when that makes literally no sense to-do, is bonkers. a

  • andycadley - Monday, August 27, 2018 12:32 PM

    If you want it as an output format, fine but 99% of non programmers won't have a clue what it means. As an input format, for the same reason, it is terrible.

    And worrying that it won't display in a fixed width column with other dates, when that makes literally no sense to-do, is bonkers. a

    The ISO 8601 display format appears a lot of places. It used in the military almost universally as far as I know. It replaced various schemes that were used by NATO that nobody could agree on. In fact, at one point when France was a member of NATO, we used Roman numerals for the months! Arrgh!

    People do have problems with mixed formats in both input and output. Does "01-05-2018" mean "the year 2018, first month, fifth day" or does it mean "the year 2018, fifth month, first day" ? I've been through some of these exercises, and you really just have to look at the data in figure out what was meant by context, if you can.

    But more than just representing a date as a date, it's often embedded in other standards. This is what makes it the second most used standard after the metric system. Many serial numbers for products, and financial instruments have the yyyymm and yyymmdd digits as part of their encoding.

    Fixed sized encoding tends to be the rule in industry standards; usually between 10 and 15 characters, if my scanning ISO standards for one of my books is representative (yes, I am so dull I actually sit down and read ISO standards that have nothing to do with what I'm working on. My wife is right; I need to get out more). The known length is part of the validation process. For example, I know that '08727' is a valid ZIP Code because it is five digits, no more no less. If I see '8727' I know that it is never going to be a ZIP Code. If you think that a leading zero can be dropped, then you don't understand how this encoding works.

    While you can scroll around on a video screen, it's really hard to stretch a paper form or label. It is also harder to scan variable length fields on a form. I can look down a printed or screen output, and quickly find missing or particular values. 

    If you look at the Unicode standards, will find that there is a set of digits simple Latin letters and a few punctuation marks that are required to be available in every writing system with the Unicode. The reason for this requirement is so that everybody on earth can have access to ISO standards, such as the metric system encodings.

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • But 2018-03 is both a fixed width and entirely unambiguous and yet completely devoid of absolutely every single issue that 2018-03-00 suffers from. The SQL standard can completely bypass the issues that arose by supporting local date formats, because that just isn't a real world issue. And it is never, ever comparable with a single date and so has literally no reason to be represented in a way that appears it could be. The MySQL team chose it because MySQL could represent invalid dates in their date storage and that seemed like an easy sentinel value for "the whole month" - there was no solid reasoning behind it and it has caused MySQL based developers an endless stream of problems because it is a terrible representation. There is literally no reason to propagate it beyond the mess it has already caused. There are endless better ways of representing a month without forcing a clumsy definition that just so happens to look like a specific date.

Viewing 15 posts - 31 through 45 (of 46 total)

You must be logged in to reply to this topic. Login to reply