Entity Framework & RowVersion

  • Hi all,

    One of our developers has put a rowversion field with the datatype timestamp into all of the tables of a new database. It is still in the design phase. The purpose of this field is to support Entity Framework.

    Aside from the fact that the timestamp datatype is deprecated, I find it to be poor database design that every table must have this 8-byte field just so that the developers can use Entity Framework.
    I object in principle that the application layer is intruding into the database.
    Each row is a statement of truth. This field doesn't even add metadata information.
    The developer got his information from this Microsoft, no less: https://docs.microsoft.com/en-us/aspnet/mvc/overview/getting-started/getting-started-with-ef-using-mvc/handling-concurrency-with-the-entity-framework-in-an-asp-net-mvc-application
    The database is still quite small. It has 14 non-enumeration tables and 170 columns between them, making up 8% of the total or 1 column in every 12 as being overhead for Entity Framework.

    I have two questions:
     1. Am I being unreasonable or is this how things are done nowadays? From a DBA perspective, it seems awfully wasteful.
     2. Is there a workaround that does not involve the database being burdened for the sake of programming convenience? Have any of ye info that I can use to help our dev and hopefully remove this unnecessary burden from the DB.

    Maybe I need to take life less seriously.
    Thanks for reading,
    Sean Redmond.

  • Skimming through the Microsoft article, it mentions the need for optimistic concurrency and row-versioning. That sounds an awful lot like the isolation level RCSI (Read Committed Snapshot Isolation).
    Does anyone know of RCSI is a requirement for Entity Framework or if it even helps?

    What I donโ€™t really understand is why EF needs to implement its own concurrency. SQLServer does this already very well with isolation levels. It is the I in ACID.
    How is it that SPs work without having to implement their own concurrency while EF has to? The whole purpose of locking is that a row or value or a page or a table is yours once you open a transaction. If optimistic concurrency is desired, then there is the Read_Committed_Snapshot_Isolated isolation level.

  • I am pretty sure that going for an ORM (and Entity Framework is one) is a good way of ensuring that your database provides the least possible benefit to your development as well as to performance in production - that's a general property of ORMs.  Unless of course Microsoft has made a new breakthrough that provides an ORM that doesn't totally screw up the database design, which I think is extremely unlikely (but I may of course just be an old fashioned cynic).  It will be interesting to see what the many real experts who contribute here have to say on this.

    Tom

  • I'm late to this conversation, but I really wanted to answer some of the questions here. I'm a developer, and before switching to Entity Framework, I coded my own data access methods, which usually included some forms of concurrency checks.

    First of all, the 'timestamp' syntax is deprecated because it has been replaced by 'rowversion'. But the underlying mechanism is NOT deprecated.

    Also, any application-level concurrency checks are very separate from database/transaction locking.

    Optimistic Concurrency helps prevent overlapping updates. Example: Bill and Ted both load the same record, make their changes, then hit Save. Bill hits Save first, updating some columns in the record. Then Ted hits Save. This could be 20 seconds later, and it could be 20 minutes later. Without concurrency checks, Ted's updates will undo and/or overwrite Bill's changes without any warning. In most situations, Ted should get a warning saying "Hey, this record has changed by someone else since you last loaded it. Are you sure you want your changes to overwrite their changes?" In most cases, the user will want to see what has already been changed.

    Without any concurrency, the SQL update statement will blindly update records.
    E.g.: UPDATE table_name SET last_name='Smith' WHERE id=5

    With Optimistic Concurrency, the SQL update statement will also check if the record has changed in the database since the user last loaded it. There are two common ways to do this, both involving additions to the 'where' clause:

    1) The 'where' clause includes all values in the record as they were last loaded by the user. If any column in the db record has changed since the user loaded it, the number of rows updated will be 0, and the user can be warned of a possible concurrency issue.
    E.g. UPDATE table_name SET last_name='Smith' WHERE id=5 AND first_name='Janice' AND last_name='McPhee' AND EmployeeType=2.
    This can be cumbersome to work with (at least manually), especially when there are lots of columns.

    2) The 'where' clause includes a single 'timestamp' ('rowversion') column. The 'rowversion' column is always automatically incremented any time a change is made to the record, so it's more reliable than checking a last_updated_time. It's also much easier to check a single column than having to check every other column in the record. Like the example above, if any column in the db record has changed since the user loaded it, the number of rows updated will be 0, and the user can be warned of a possible concurrency issue.
    E.g. UPDATE table_name SET last_name='Smith' WHERE id=5 AND rowversion=1230

    When the database design can be modified, it is simple to go with option 2 and add a 'rowversion' field to the record. Otherwise, option 1 is usually used.

    This has nothing to do with database locking. You can't use a database lock to prevent the Bill & Ted situation I described above. And when you have dozens or hundreds of users, Optimistic Concurrency is the only reasonable way to prevent users from unknowingly clobbering each other's changes without having to implement some sort of check-out/check-in locking.

  • ben 82447 - Wednesday, January 16, 2019 9:21 AM

    I'm late to this conversation, but I really wanted to answer some of the questions here. I'm a developer, and before switching to Entity Framework, I coded my own data access methods, which usually included some forms of concurrency checks.

    First of all, the 'timestamp' syntax is deprecated because it has been replaced by 'rowversion'. But the underlying mechanism is NOT deprecated.

    Also, any application-level concurrency checks are very separate from database/transaction locking.

    Optimistic Concurrency helps prevent overlapping updates. Example: Bill and Ted both load the same record, make their changes, then hit Save. Bill hits Save first, updating some columns in the record. Then Ted hits Save. This could be 20 seconds later, and it could be 20 minutes later. Without concurrency checks, Ted's updates will undo and/or overwrite Bill's changes without any warning. In most situations, Ted should get a warning saying "Hey, this record has changed by someone else since you last loaded it. Are you sure you want your changes to overwrite their changes?" In most cases, the user will want to see what has already been changed.

    Without any concurrency, the SQL update statement will blindly update records.
    E.g.: UPDATE table_name SET last_name='Smith' WHERE id=5

    With Optimistic Concurrency, the SQL update statement will also check if the record has changed in the database since the user last loaded it. There are two common ways to do this, both involving additions to the 'where' clause:

    1) The 'where' clause includes all values in the record as they were last loaded by the user. If any column in the db record has changed since the user loaded it, the number of rows updated will be 0, and the user can be warned of a possible concurrency issue.
    E.g. UPDATE table_name SET last_name='Smith' WHERE id=5 AND first_name='Janice' AND last_name='McPhee' AND EmployeeType=2.
    This can be cumbersome to work with (at least manually), especially when there are lots of columns.

    2) The 'where' clause includes a single 'timestamp' ('rowversion') column. The 'rowversion' column is always automatically incremented any time a change is made to the record, so it's more reliable than checking a last_updated_time. It's also much easier to check a single column than having to check every other column in the record. Like the example above, if any column in the db record has changed since the user loaded it, the number of rows updated will be 0, and the user can be warned of a possible concurrency issue.
    E.g. UPDATE table_name SET last_name='Smith' WHERE id=5 AND rowversion=1230

    When the database design can be modified, it is simple to go with option 2 and add a 'rowversion' field to the record. Otherwise, option 1 is usually used.

    This has nothing to do with database locking. You can't use a database lock to prevent the Bill & Ted situation I described above. And when you have dozens or hundreds of users, Optimistic Concurrency is the only reasonable way to prevent users from unknowingly clobbering each other's changes without having to implement some sort of check-out/check-in locking.

    heh I remember the last time I tried explaining optimistic concurrency here, I was taken to task why I used "nolock" LOL

    https://www.sqlservercentral.com/Forums/FindPost1935673.aspx

  • ben 82447 - Wednesday, January 16, 2019 9:21 AM

    Optimistic Concurrency helps prevent overlapping updates. Example: Bill and Ted both load the same record, make their changes, then hit Save. Bill hits Save first, updating some columns in the record. Then Ted hits Save. This could be 20 seconds later, and it could be 20 minutes later. Without concurrency checks, Ted's updates will undo and/or overwrite Bill's changes without any warning. In most situations, Ted should get a warning saying "Hey, this record has changed by someone else since you last loaded it. Are you sure you want your changes to overwrite their changes?" In most cases, the user will want to see what has already been changed.

    2) The 'where' clause includes a single 'timestamp' ('rowversion') column. The 'rowversion' column is always automatically incremented any time a change is made to the record, so it's more reliable than checking a last_updated_time. It's also much easier to check a single column than having to check every other column in the record. Like the example above, if any column in the db record has changed since the user loaded it, the number of rows updated will be 0, and the user can be warned of a possible concurrency issue.

    Two aspects of Entity Framework annoy me as a DBA intensely:

    1. The database is filled with fields to help the application server do its job.
     I mean all of the fields with GUID and rowversion datatypes. The problems they fix can be solved by other means, without having to resort to filling up our pages with bloated datatypes that store application server metadata.
    For the problem of change detection within rows, have a look at these three articles:
    a. http://www.sqlservercentral.com/articles/change+detection/95930/
    b. http://www.sqlservercentral.com/articles/SQL+Server/95931/
    c. https://docs.microsoft.com/en-us/sql/t-sql/data-types/rowversion-transact-sql?view=sql-server-2017

    2. Entity Framework is good for meeting sprint deadlines but bad for the system as a whole
    Developers like EF because they can develop faster in it than with stored procedures. The problems in generates are threefold: firstly, it often generates inefficient code, secondly, only those with experience of and access to the Data Access Layer can fix. Stored procedures can be optimised or fixed by DBAs as well. Indeed, SQL Server keeps lots of useful metadata about SPs because they are DB objects. Thirdly, EF is time theft. The time saved by developers in development is paid for by the poor DBA when he/she must pour through the horrible code it generates and which they cannot fix. There are a good many things that a DBA can optimise in the DB with the help of the metadata about stored procedures, index replacement, alteration or removal, for example. This becomes a whole lot harder when we only have the expression of EF code to work with.

  • Sean Redmond - Thursday, January 24, 2019 12:11 AM

    ben 82447 - Wednesday, January 16, 2019 9:21 AM

    Optimistic Concurrency helps prevent overlapping updates. Example: Bill and Ted both load the same record, make their changes, then hit Save. Bill hits Save first, updating some columns in the record. Then Ted hits Save. This could be 20 seconds later, and it could be 20 minutes later. Without concurrency checks, Ted's updates will undo and/or overwrite Bill's changes without any warning. In most situations, Ted should get a warning saying "Hey, this record has changed by someone else since you last loaded it. Are you sure you want your changes to overwrite their changes?" In most cases, the user will want to see what has already been changed.

    2) The 'where' clause includes a single 'timestamp' ('rowversion') column. The 'rowversion' column is always automatically incremented any time a change is made to the record, so it's more reliable than checking a last_updated_time. It's also much easier to check a single column than having to check every other column in the record. Like the example above, if any column in the db record has changed since the user loaded it, the number of rows updated will be 0, and the user can be warned of a possible concurrency issue.

    Two aspects of Entity Framework annoy me as a DBA intensely:

    1. The database is filled with fields to help the application server do its job.
     I mean all of the fields with GUID and rowversion datatypes. The problems they fix can be solved by other means, without having to resort to filling up our pages with bloated datatypes that store application server metadata.
    For the problem of change detection within rows, have a look at these three articles:
    a. http://www.sqlservercentral.com/articles/change+detection/95930/
    b. http://www.sqlservercentral.com/articles/SQL+Server/95931/
    c. https://docs.microsoft.com/en-us/sql/t-sql/data-types/rowversion-transact-sql?view=sql-server-2017

    2. Entity Framework is good for meeting sprint deadlines but bad for the system as a whole
    Developers like EF because they can develop faster in it than with stored procedures. The problems in generates are threefold: firstly, it often generates inefficient code, secondly, only those with experience of and access to the Data Access Layer can fix. Stored procedures can be optimised or fixed by DBAs as well. Indeed, SQL Server keeps lots of useful metadata about SPs because they are DB objects. Thirdly, EF is time theft. The time saved by developers in development is paid for by the poor DBA when he/she must pour through the horrible code it generates and which they cannot fix. There are a good many things that a DBA can optimise in the DB with the help of the metadata about stored procedures, index replacement, alteration or removal, for example. This becomes a whole lot harder when we only have the expression of EF code to work with.

    Entity Framework isn't designed to "meet sprint deadlines", I see it being a tool to amplify the productivity of developers. Now, in my opinion, EF isn't there to help newb developers get up to speed fast, EF is there to help experienced developers increase their productivity. As such, to get the best use out of EF and tools like this, the developer needs to have a wide and deep set of knowledge of the system and coupled with experience programming database applications and the particulars of their RDBMS target, they'll have a chance of avoiding issues.

    I do get the sense that hiring experienced developers is very unpopular in some industries and I do see a trend that programmers aren't learning their database target platform, but I don't think this is the fault of EF or other frameworks. Lack of knowledge and inexperience needs to be filtered out of the mix before we can really get an idea of how good these platforms are, and I don't see you getting by this filter either, because that also includes the DBA. If you don't know anything about EF other than bad implementations using it, then you might not have all the information needed to successfully evaluate it, and I cannot without a doubt eliminate this possibility in evaluating your contribution here.

  • patrickmcginnis59 10839 - Thursday, January 24, 2019 8:11 AM

    Entity Framework isn't designed to "meet sprint deadlines", I see it being a tool to amplify the productivity of developers. Now, in my opinion, EF isn't there to help newb developers get up to speed fast, EF is there to help experienced developers increase their productivity. As such, to get the best use out of EF and tools like this, the developer needs to have a wide and deep set of knowledge of the system and coupled with experience programming database applications and the particulars of their RDBMS target, they'll have a chance of avoiding issues.

    I've known many experienced developers who have wide and deep knowledge of the system they are trying to implement and experiemce of database applications and particulars of the RDBMS they are intended to use.  Every single one of them absolutely loathed any sort of ORM and that included loathing EF.  They loathed it because they recognised that any sort of ORM or EF will prevent proper design of data storage and and control, will be used by arrogant know-it-all OO developers to prevent database and operations experts from carrying any weight in discussions of how to achieve objectives, and will ensure that significantly more CPU power and data transfer speed is required than for a proper separation of an archaic (in data management terms) OO system from the database and data management, thus vastly increasing the cost of the resulting system at the same time as significantly decreasing both its reliability and its flexibility.
    Increasing the productivity of a few C++ developers is a bit of a false economy when the means of that increase destroy the productivity of other developers and of production operations staff as well as vastly increasing hardware and data transmission costs and reducing system reliability.

    Tom

  • TomThomson - Saturday, January 26, 2019 5:15 PM

    patrickmcginnis59 10839 - Thursday, January 24, 2019 8:11 AM

    Entity Framework isn't designed to "meet sprint deadlines", I see it being a tool to amplify the productivity of developers. Now, in my opinion, EF isn't there to help newb developers get up to speed fast, EF is there to help experienced developers increase their productivity. As such, to get the best use out of EF and tools like this, the developer needs to have a wide and deep set of knowledge of the system and coupled with experience programming database applications and the particulars of their RDBMS target, they'll have a chance of avoiding issues.

    I've known many experienced developers who have wide and deep knowledge of the system they are trying to implement and experiemce of database applications and particulars of the RDBMS they are intended to use.  Every single one of them absolutely loathed any sort of ORM and that included loathing EF.  They loathed it because they recognised that any sort of ORM or EF will prevent proper design of data storage and and control, will be used by arrogant know-it-all OO developers to prevent database and operations experts from carrying any weight in discussions of how to achieve objectives, and will ensure that significantly more CPU power and data transfer speed is required than for a proper separation of an archaic (in data management terms) OO system from the database and data management, thus vastly increasing the cost of the resulting system at the same time as significantly decreasing both its reliability and its flexibility.
    Increasing the productivity of a few C++ developers is a bit of a false economy when the means of that increase destroy the productivity of other developers and of production operations staff as well as vastly increasing hardware and data transmission costs and reducing system reliability.

    I can understand that abstractions can require tradeoffs. I'm not understanding how guys devote years of their careers to ORM's despite loathing them and telling you that they destroy the very people they should be helping with development.

    In other words, are they the baddies?

  • I know this post is old enough where the horse has already left the barn but...

    The root of this problem isn't what the Developer wants to design.  The root of the problem is that it sounds like the Developer is the only one with a plan. ๐Ÿ˜‰

    What needs to happen is that people need to get together and communicate about the design with some goals.
    1.  Determine what is needed to work correctly with the understanding that there is more than one method to do everything.
    2.  Determine what is most sustainable for the long haul.  That includes scalability, resource usage, and maintainability, at the very least.

    The bottom line is that there needs to be a design plan that is actually a planned design that is documented and followed with the understanding that tweaks due to special circumstances will be allowed but only with concurrence of the design team.  For that to be successful, people must leave their egos and home and be prepared to demonstrate rather than speculate as to why something is good or bad.

    It is, indeed, a rare opportunity to design a new application along with a brand spankin' new database.  Don't blow that opportunity to do it right for a change, whatever "right" means for the given application and database.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sean Redmond - Friday, July 13, 2018 12:05 AM

    Skimming through the Microsoft article, it mentions the need for optimistic concurrency and row-versioning. That sounds an awful lot like the isolation level RCSI (Read Committed Snapshot Isolation).
    Does anyone know of RCSI is a requirement for Entity Framework or if it even helps?

    What I don’t really understand is why EF needs to implement its own concurrency. SQLServer does this already very well with isolation levels. It is the I in ACID.
    How is it that SPs work without having to implement their own concurrency while EF has to? The whole purpose of locking is that a row or value or a page or a table is yours once you open a transaction. If optimistic concurrency is desired, then there is the Read_Committed_Snapshot_Isolated isolation level.

    Its my understanding that read committed snapshot doesn't do all you need if extra programming isn't provided. Obviously for low concurrency you aren't going to see problems (almost all of the time anyways). The case comes in with transactions that involve multiple tables. When you try your update on one table and this update depends on values in other related tables, what happens when the other table gets values updated that your now started transaction depends on?

    I'm pretty interested in these cases and I think you bring up a good question. I remember reading that snapshot isolation has some cases that result in doomed transactions. That in itself is a good thing because if you catch this error, you can rollback and retry.

    On the one hand, you can certainly code for these situations in your stored procedure. Having the application do this as a matter of the interface itself means that its already coded. I get that the application does not use the snapshot version of a doomed transaction to solve this, rather it uses the timestamps or version numbers like you said. I think the motivation is that this error handling is sort of "boilerplate" and therefore developers use EF's mechanism as its already coded.

    I am inclined to guess that some ORM's want to  generalize this behavior in such a way as to get a bit of independence from the database implementation, now obviously cue the detractors of database portability but hey, the marketplace has other ideas, theres the possibility of people still wanting to reduce dependencies here.

    I'm certainly not the expert on this, but I think when considering snapshot isolation, the issue might be googleable by using "write skew snapshot isolation", its a very interesting rabbit hole for me and I wish I were better informed on this, I know with my first SQL transaction project, I said "f this" and went with serializable LOL

    edit: I know old thread, but I couldn't resist addressing an interesting question, even if I'm unable to answer it in a comprehensive manner. It certainly is a topic that seems not to be addressed much on SSC, although Gail Shaw has done an excellent job in some of her writing on the topic of concurrent updates.

  • Sean Redmond - Thursday, January 24, 2019 12:11 AM

    Two aspects of Entity Framework annoy me as a DBA intensely:

    1. The database is filled with fields to help the application server do its job.
     I mean all of the fields with GUID and rowversion datatypes. The problems they fix can be solved by other means, without having to resort to filling up our pages with bloated datatypes that store application server metadata.
    For the problem of change detection within rows, have a look at these three articles:
    a. http://www.sqlservercentral.com/articles/change+detection/95930/
    b. http://www.sqlservercentral.com/articles/SQL+Server/95931/
    c. https://docs.microsoft.com/en-us/sql/t-sql/data-types/rowversion-transact-sql?view=sql-server-2017

    The articles to which you linked specifically say that rowversion is perfect for Optimistic Concurrency. "ROWVERSION is the way to go if you are looking to implement an optimistic concurrency solution within the product."

    Our developers (4) are also our DBAs. While we have more developer training/experience than DBA training/experience, we care more about the performance of our databases than our apps. As a result, we use Entity Framework wisely. We design and implement the databases (2NF or as close as possible), then wire up EF in our apps to use the databases without making any changes. I've never added a single extra column to make EF work. And yes, we use rowversion on tables where we need to maintain optimistic concurrency, but we've used that for over 10 years, long before considering using an ORM. Optimistic Concurrency using rowversion worked the same way back when we manually coded our data access layers as it does now with EF.

    I still can't see how any built-in SQL Server functionality (snapshots, isolation levels, transactions, etc.) would help with optimistic concurrency in a web-based application without there being performance or locking issues. For example, when dozens of people access the same data throughout a workday with no intention of changing it, and 2 people happen to have accessed the data in the morning and finally get around to submitting conflicting changes at the end of the work day, how would locks or isolation levels help in this case? Meanwhile, all the app has to do is pass through the rowversion that was retrieved along with the data, attempt to save it with a quick "where rowversion = xxx", and then conflicts are detected. It could be seconds, hours, or days after reading the data, and it still works flawlessly.

    Just like any other ORM, most developers will let EF design & maintain the databases for them. That may be fine for some, but in the long run it will backfire when performance or scalability becomes an issue. I agree with you all on that. I wish the focus in EF tutorials was to design the database first, and then let EF work with what it is given. Instead, most tutorials focus on how the magical EF can take your ideas in code and handle the back-end for you, and that seems to appeal to most people. But a good developer who understands even the basics of data design, execution plans, and performance in general, will use the "good" parts of EF without letting it take full control.

    We still use stored procs when we know EF won't do a good job. But when it comes to basic CRUD operations, EF usually handles them just fine as long as we have designed the database. In our case, EF does save us time.

  • ben 82447 - Tuesday, January 29, 2019 8:37 AM

    Just like any other ORM, most developers will let EF design & maintain the databases for them. That may be fine for some, but in the long run it will backfire when performance or scalability becomes an issue. I agree with you all on that. I wish the focus in EF tutorials was to design the database first, and then let EF work with what it is given. Instead, most tutorials focus on how the magical EF can take your ideas in code and handle the back-end for you, and that seems to appeal to most people. But a good developer who understands even the basics of data design, execution plans, and performance in general, will use the "good" parts of EF without letting it take full control.

    We still use stored procs when we know EF won't do a good job. But when it comes to basic CRUD operations, EF usually handles them just fine as long as we have designed the database. In our case, EF does save us time.

    This is how the ORMs should work. Too many people use them for everything, including DB design, and for complex business logic. For CRUD type operations, you aren't writing better code then EF, not by any margin that is in the same time for most developers.

    Use them wisely, watch datatypes to avoid implicit conversions, and when db logic gets complex, use stored procs.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply