• diegosiao (8/28/2015)


    Sorry Jeff, but you do not have the authority to make this statement,

    People who don't really know much about databases are the ones that frequently suggest the approach [code-first].

    Actually, I do have the "authority", much the same as the "authority" you have to say I don't. 😉

    That is probably based on a personal bad experience, or even the result of the humor from a bad day. Not a truth that can be universally applied.

    Based on the fact that every project (and there have been many) that I've seen or had to clean up after that was done on a "code first" basis turned out to be nightmare of very bad database design and very poorly performing, high resource usage code, I could have stated it universally but didn't. Go back and look.

    I do not want to be rude or disrespectful, but your position is pointless and contradictory, in summary you said that people who suggest the code-first approach "frequently" are not proficient DBAs or they will not revise the generated script properly. So what is left for them? The database-first approach. Implicitly you are suggesting that an individual that is not able to revise a DDL script should write it and model the business logic of an application.

    You sound like a "front-ender" (and that's no insult... you folks have a tough job. I know because I used to be one.) that might have been able to pull off a good "code first" implementation. That's a very rare thing from what I've seen and my hat's off to you if it's true.

    But my position is neither pointless nor contradictory. To restate my position and without exception so far, I've not seen even one "code first" project work out. Oh sure, it "got the job done" and they where able to "ship something" that worked, but only for a short while. Once any decent volume of data was in the database, the code got slower and slower and finally got to the point where the application would frequently time out, many transactions were lost due to deadlocking, and making modifications to both the code and the underlying database became anything but easy frequently taking a day or two just to figure out where the change needed to be made and how to implement the change.

    A couple of the companies decided to throw hardware at the problem. One company doubled the number of processors to 32, quadrupled their ram for 64GB to 256GB, and upgraded from SQL Server Standard Edition to the Enterprise Edition. It made things a little better but half a slow as molasses is still pretty slow. The reason why it didn't help is because the code couldn't take advantage of multiple processors and the number of logical reads from memory was still astronomical. Even memory IO has its limits.

    The OOP languages have a lot more instrumentation for business logic modeling and prototyping (e.g. abstract classes, interfaces, empty methods and etc.). For the sake of productivity DBAs can and should take advantage of generated DDL scripts considering the inalienable responsibility of revising it and applying the improvements and corrections that are invariably necessary.

    And THAT is a major part of the problem. While I agree that "business logic" is key to WHAT must be delivered, it usually has nothing to do with HOW the data is stored nor how it should be retrieved. Said "Business Logic" is usually driven only by requirements of what to display and when. It rarely mandates such things as Third Normal Form of the data nor does it usually mandate any kind of performance other than when folks working on screens seem to think that something is running slow. In most cases, they also have no clue as to how many resources are being consumed and so have an equally small clue as to what performance will be when the data in the database scales up.

    I'm going through it right now with a 3rd party vendor that wrote "code first" code. To start with, they had no plan and no thought of common sense naming conventions. As a result (for example), there are dozens of tables that should all make reference to a "BatchID". Now, there are several versions of how that column name could be spelled out...

    BatchID

    Batch_ID

    Batch_Num

    Batch_Number

    Batch#

    BatchNo

    Batch_No

    Batch

    ... and they used ALL of them! Worse than that, some of the "developers" used NUMERIC(18) (which is acceptable in this case), some used NUMERIC(18,0) (also ok in this case), and some used NVARCHAR(256) or NVARCHAR(18). One even used NCHAR(18).

    That's all because of exactly what I said. "Code first" is frequently done by people that have no plan, no standards, and no understanding of even a common naming convention or the importance of datatype matching and using the correct datatypes to begin with.

    Most of the code is embedded SQL in "managed" code and for those stored procedures that they did write, they're ANSI-Only pieces of junk that handle just one row at a time for processes that handle tens of thousands of documents a day even though they're all staged in a nice staging table. As a result, their "managed" code hits the database with more than 22,000 single row calls every 30 seconds to process just 20 documents. Apparently, they don't even know how to use the code that they profess knowledge of because about half the calls are duplicate calls. Worse yet, they had to setup 12 machines to handle the load because the code wouldn't work fast enough to handle the load.

    Another recent example is a small company that I'm trying to help for gratis. I won't get into why they need to do it but one of the things their code needs to do is create 250,000 rows in the database on demand when a customer creates a "new product". They accept 6 parameters from the user plus a start and end date and use an algorithm to generate 250,000 INSERT INTO/VALUES statements and send them all to the database to create the data. Then they wonder why the pipe fills up and the database takes a long time to insert the rows. The process takes about a half hour to execute. I created a single stored procedure that takes the inputs and creates/inserts the rows in sub-second times.

    And, in a previous job, I spent most of my time fixing extremely long running and heavy hitting code written by the "code first" team. A lot of the slowness was caused by data-type mismatches and the fact that they did nearly everything in a RBAR fashion even when it was for some form of batch code including reporting to the screen. One example is that a weekly report would take 10 seconds, that same report for a month would take 12 minutes, and the yearly report would run (pegging 4 CPUs to the wall and absolutely killing memory and IO) for 20 minutes and time out. When the "developers" were called on the carpet about the performance, especially the required yearly report, their answer was "you've simply asked for too much data to be reported". When I asked the lead "developer" about the data-type mismatches, he literally stuck his tongue out me and then said "premature optimization is the root of all evil". He's right on that but this has nothing to do with premature optimization. They "just" wrote really bad code for the tasks and "designed" really bad table structure and it was the primary cause of all failures and performance problems especially since they didn't understand the nature of batch processing.

    To summarize, to date, I've seen a lot of "code first" projects offer a robust feature list but I've yet to see one offer long term performance without having to go massively parallel on hardware and still need to do frequent upgrades to keep performance up.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)