No PKs on fact tables

  • For that reason, it's not really comparing apples to apples because with that same logic, who's to say your constraints ALSO don't get removed between environments too.

    I'm certain they don't get removed between versions because, in my case, development is production.  Even if it weren't, as in other places I've worked, I spent more time working with production anyway.  Even if that weren't true, that wouldn't be an excuse not to use them.  They are part of the fundamental rules.

  • xsevensinzx - Friday, April 14, 2017 5:51 AM

    Feel free to explain why a simple script as something like SELECT WHERE NOT EXIST or even MERGE would totally fail it's checks and still insert records WHERE IT DOES EXIST in batch mode only, especially when preceding code that actually removes duplicates from staging before going into primary. I personally have yet to see it fail, but that does not mean I'm objective to avoiding it and trying something different.

    And I would disagree that people who create systems to ensure row-level uniqueness is not maintained by ways of defining a traditional PK as someone not willing to go through the rigor of following them. If anything, they are going through more rigor than someone simply creating a PK. It's just as you said, they are subject to errors because you are relying on a human to think of the system than just using the machine to do it for you.

    I wouldn't say that every system or even my system is error free, but something as simple as ensuring row-level uniqueness is not exactly rocket science for a ETL system. I view PK constraints as methods to ensure row-level uniquness happens when you have more than one system touching that table. For example, multiple applications within multiple coders with lots of hands with concurrency and in OLTP environment.

    If "something like SELECT WHERE NOT EXIST or even MERGE" would not fail then presence of a PK would not create any issue for you.
    The  fact that you're avoid creating a PK on your target table gives a clear indication that your processes fail time to time, inserting  duplicates, and you don't want to have run-time errors because of that.

    As for "multiple hands" - is your database in a single_user mode?
    I bet it's not.
    If so - you allow "more than one system touching that table".
    Which means - no checks in ETL can guarantee row-level uniqueness in your table.

    _____________
    Code for TallyGenerator

  • Sergiy - Wednesday, May 3, 2017 3:34 PM

    xsevensinzx - Friday, April 14, 2017 5:51 AM

    Sergiy - Wednesday, May 3, 2017 3:34 PM

    xsevensinzx - Friday, April 14, 2017 5:51 AM

    Feel free to explain why a simple script as something like SELECT WHERE NOT EXIST or even MERGE would totally fail it's checks and still insert records WHERE IT DOES EXIST in batch mode only, especially when preceding code that actually removes duplicates from staging before going into primary. I personally have yet to see it fail, but that does not mean I'm objective to avoiding it and trying something different.

    And I would disagree that people who create systems to ensure row-level uniqueness is not maintained by ways of defining a traditional PK as someone not willing to go through the rigor of following them. If anything, they are going through more rigor than someone simply creating a PK. It's just as you said, they are subject to errors because you are relying on a human to think of the system than just using the machine to do it for you.

    I wouldn't say that every system or even my system is error free, but something as simple as ensuring row-level uniqueness is not exactly rocket science for a ETL system. I view PK constraints as methods to ensure row-level uniquness happens when you have more than one system touching that table. For example, multiple applications within multiple coders with lots of hands with concurrency and in OLTP environment.

    If "something like SELECT WHERE NOT EXIST or even MERGE" would not fail then presence of a PK would not create any issue for you.
    The  fact that you're avoid creating a PK on your target table gives a clear indication that your processes fail time to time, inserting  duplicates, and you don't want to have run-time errors because of that.

    As for "multiple hands" - is your database in a single_user mode?
    I bet it's not.
    If so - you allow "more than one system touching that table".
    Which means - no checks in ETL can guarantee row-level uniqueness in your table.

    Feel free to explain why a simple script as something like SELECT WHERE NOT EXIST or even MERGE would totally fail it's checks and still insert records WHERE IT DOES EXIST in batch mode only, especially when preceding code that actually removes duplicates from staging before going into primary. I personally have yet to see it fail, but that does not mean I'm objective to avoiding it and trying something different.

    And I would disagree that people who create systems to ensure row-level uniqueness is not maintained by ways of defining a traditional PK as someone not willing to go through the rigor of following them. If anything, they are going through more rigor than someone simply creating a PK. It's just as you said, they are subject to errors because you are relying on a human to think of the system than just using the machine to do it for you.

    I wouldn't say that every system or even my system is error free, but something as simple as ensuring row-level uniqueness is not exactly rocket science for a ETL system. I view PK constraints as methods to ensure row-level uniquness happens when you have more than one system touching that table. For example, multiple applications within multiple coders with lots of hands with concurrency and in OLTP environment.

    If "something like SELECT WHERE NOT EXIST or even MERGE" would not fail then presence of a PK would not create any issue for you.
    The  fact that you're avoid creating a PK on your target table gives a clear indication that your processes fail time to time, inserting  duplicates, and you don't want to have run-time errors because of that.

    As for "multiple hands" - is your database in a single_user mode?
    I bet it's not.
    If so - you allow "more than one system touching that table".
    Which means - no checks in ETL can guarantee row-level uniqueness in your table.

    If you feel there is never a issue with SQL Server ignoring the PK constraint and randomly inserting records that void that constraint, then it's the exact same feeling I have with WHERE NOT EXIST. I don't feel SQL Server will just ignore that clause and insert records WHERE IT DOES EXIST.

    PK's are really good for protecting the table against other users and even yourself from doing it accidently. And you are right, it's not a single user mode. However, only the ETL user can write to that table outside of me who maintains it. This is a data warehouse and like most data warehouses, only the ETL is approved to write to the primary tables. Not joe the programmer who is writing some hot new app. Thus, as long as WHERE NOT EXIST does not fail, the ETL is not going to fail. Really not rocket science here.

    Again, don't take this as me saying PK's are useless. They absolutely are not. I use them in 95% of the cases. Just not here on certain extremely large fact tables where ETL is full control of the uniqueness. Has not failed because like the PK constraints, SQL Server is not going to ignore WHERE NOT EXIST randomly.

  • Has not failed because like the PK constraints, SQL Server is not going to ignore WHERE NOT EXIST randomly.

    This is like playing a game a soccer without the goal keeper.  You might keep the ball from getting in, but it's a lot more difficult.  It's your call.  But preserving the integrity of the data is the DBA's first duty.  You are giving up one of the most important ways the DBA preserve's that data.

  • RonKyle - Thursday, May 4, 2017 6:42 AM

    Has not failed because like the PK constraints, SQL Server is not going to ignore WHERE NOT EXIST randomly.

    This is like playing a game a soccer without the goal keeper.  You might keep the ball from getting in, but it's a lot more difficult.  It's your call.  But preserving the integrity of the data is the DBA's first duty.  You are giving up one of the most important ways the DBA preserve's that data.

    That absolutely makes zero sense RonKyle. 😛 The goalkeeper is still the code and the SQL Server as no one has write access to the table except me. It's no different than a PK constraint where again, SQL Server is the goalkeeper and you only can remove that constraint and add duplicates. It's just two different ways of ensuring row-level uniqueness. Unless you can say the other will fail versus the other (i.e.: let the ball in the goal), then what is the point you are trying to make there?

  • Best practices exist to be neglected.
    Data consistency should not be dependant on the person doing the entry job
    .

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • xsevensinzx - Thursday, May 4, 2017 1:12 PM

    The goalkeeper is still the code and the SQL Server as no one has write access to the table except me.

    What happens when you go away or, as nature sometimes has it, you eventually not being the only one with direct access, or you're given an update script to execute to make a data correction and you simply forget?

    I DO understand your point of view but things happen and I'd rather make it so that tables are able to protect themselves when it comes to such things.... especially before coffee. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I DO understand your point of view

    That makes you smarter than me because I really don't understand why you would simply forfeit such vital tools.  I've been doing it that way since I started designing Access databases and have seen too many problems in any system that didn't use them.

  • RonKyle - Friday, May 5, 2017 6:09 AM

    I DO understand your point of view

    That makes you smarter than me because I really don't understand why you would simply forfeit such vital tools.  I've been doing it that way since I started designing Access databases and have seen too many problems in any system that didn't use them.

    The answer is because it's the easy way out rather than the right way and I DO understand why people think such a thing (although I vehemently disagree with it).  Perception is reality for a lot of people and perceptions won't change until something goes "boom".  I don't know who first said it but it's true that "It's easier to fool some people than it is to convince them" and a lot of people fool themselves in the face of some urgency. 😉  Think of how many people justify the use of totally crap code that stands no chance when it come to scalability by saying "Well.  It will never me used for more than <insert ridiculous low number here> rows so it's ok to use <insert non-scalable method here>".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Friday, May 5, 2017 6:02 AM

    xsevensinzx - Thursday, May 4, 2017 1:12 PM

    The goalkeeper is still the code and the SQL Server as no one has write access to the table except me.

    What happens when you go away or, as nature sometimes has it, you eventually not being the only one with direct access, or you're given an update script to execute to make a data correction and you simply forget?

    I DO understand your point of view but things happen and I'd rather make it so that tables are able to protect themselves when it comes to such things.... especially before coffee. 😉

    Sure, but it's not like you don't have that row-level uniqueness on your tables. It's just maintained by code. There is deduplication scripts that you can implement too that will fix itself in the event of stupidity.

    Got to remember guys, proper systems are designed to maintain themselves and work with other systems. This means, if designed correctly like most ETL systems should be for data warehouses, that updates, inserts and even deletes are handled by the system. The only time a admin should be getting involved to exercise their rights to that table is when the physical model changes and a update needs to happen. That's not a before coffee ordeal.

    In the event that you have a system like that where you leave and Joe Blow decides to circumvent the system, then I see that action as no different than someone removing the PK constraints and inserting duplicates. PK or ETL can't fix people who just want to watch the world burn. :w00t:

  • Is there a way to run unit tests on Redshift / Azure SQL DW to detect these sorts of problems?
    Or would you run a separate test enviroment in a similar technology (regular db), do unit tests there and only run validated queries in production?

  • daniel.bartley 78268 - Monday, May 8, 2017 6:49 PM

    Is there a way to run unit tests on Redshift / Azure SQL DW to detect these sorts of problems?
    Or would you run a separate test enviroment in a similar technology (regular db), do unit tests there and only run validated queries in production?

    If you mean to detect duplicate records? Yes, there are queries you can run on the tables to detect row-level uniqueness without a separate environment. It's just a matter of how you deal with such duplicates when you find them in those environments where updates and deletes are not handled as well as a traditional RDBMS.

    You might find it just easier to copy all your unique records to a whole new table in those environments versus the other.

Viewing 12 posts - 31 through 41 (of 41 total)

You must be logged in to reply to this topic. Login to reply