HELP! I've got to give a talk to our local .NET geek meet on SQL server

  • Hi guys,

    I have agreed (foolishly) to give a 30-minute presentation to our local .NET developers meeting on entitled "Why your database is more than just tables". For those not from a .NET background (which includes me - I am learning) When you do code-first design (i.e. define the tables as .NET classes) you can use annotations to tell the system what the data type is and whether there are any restrictions like min/max length or limited to a specific list of values but this does not let you define whether your STRING .NET datatype is a Char, Varchar, Nchar or Nvarchar field. OR you can use the Fluid API to be a bit more specific.

    I am thinking I should probably try to cover the following topics

    --Data Type considerations and mapping .NET datatypes to SQL datatypes

    --Indexing strategies (Don't use GUIDs as PK, make sure you have a PK, consider secondary and covering indexes. index read-write performance tradeoffs)

    --Page Fill considerations (keep you DBA happy)

    --Check constraints and referential integrity

    --Transaction scoping and concurrency especially in RESTful applications

    I only have 30 minutes so it will be a pretty high level introduction to database performance design. Do you think these are the right topics or do the DBAs amongst you have experience of more common 'errors' by non database developers?

  • aaron.reese (10/17/2016)


    When you do code-first design

    Heh... "Never". 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (10/17/2016)


    aaron.reese (10/17/2016)


    When you do code-first design

    Heh... "Never". 😛

    +10000000000000000000

    Spent years on fixing that kind of cock-ups, makes me wonder about the evil that men do (sorry CB, wherever you are).

    😎

  • Aside from the probability that the physical model of a "code first" database will suffer (no proper indexing, poorly typed, etc.) another issue is that the logical model itself will be strongly tied to the application's object model. While this may provide some superficial benefit for developers in the short term, what it means in the long term is that the database won't be suitable for reporting purposes, it won't scale, and any changes to the application's object model will require changes to the data model.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • aaron.reese (10/17/2016)


    I am thinking I should probably try to cover the following topics

    --Data Type considerations and mapping .NET datatypes to SQL datatypes

    --Indexing strategies (Don't use GUIDs as PK, make sure you have a PK, consider secondary and covering indexes. index read-write performance tradeoffs)

    --Page Fill considerations (keep you DBA happy)

    --Check constraints and referential integrity

    --Transaction scoping and concurrency especially in RESTful applications

    I'll be blunt, the chance of doing more than 'this is a <whatever>' for each of those in 30 minutes is slim. That's 6 minutes per point (excluding intro, conclusion and questions). Reduce the scope. 2, maybe 3 topics if you plan to gloss over them a bit.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • .NET classes are not tables.

    At most, they are views.

    Tables must represent entities and connections between entities.

    Classes usually represent roles of entities in relation to other entities.

    Say, class "Customer" represents a person or organisation which is in business relation with another person or organisation.

    The same person may be many times a "Customer" - in relation to different trading organisations.

    "Customer" does not have a "residential address" - a person does.

    When a person becomes a "Customer" the residential address is "inherited" by "Customer" from "Person".

    Same for an organisation as a customer: "Customer" does not have Tax Registration number, it inherits it from the organisation which plays the role of Customer in any particular trading relation.

    By creating a table Customer from a corresponding class you doom the system to an infinite chain of table alterations, redesigning, performance problems.

    And complains about "inflexible nature" of relational databases.

    I believe if you start from this topic it will take the whole day, as entity-centric way of thinking about data design usually meets strong resistance from .NET developers.

    _____________
    Code for TallyGenerator

  • WOW!:w00t:

    I think I may have hit a nerve!

    I know traditionally that database devs and application devs have made uneasy bedfellows but I didn't realise that these attitudes were still so prevalent.

    Firstly, I am a database guy, I love 3NF and have been doing mainly MSSQL and a bit of MySQL/Access for almost 20 years. I cut my teeth on PHP OOP using the CodeIgniter framework and am teaching myself C#.NET MVC.

    The statement about classes are not tables is true to a certain extent - they are both technical methods of trying to represent real-world objects, relationships and processes but to say that they are not the same is an over-simplification of the issue. in .NET the CUSTOMER class directly relates to the CUSTOMER table. It has a property of ADDRESS which is a separate table and can access all the methods and properties of the ADDRESS complex type because as well as being a table, ADDRESS is also a class. The CUSTOMER class also has a virtual collection of ORDER instances. Each ORDER also has a collection of ORDERLINE instances and a collection of PAYMENT instances. ORDERLINE is also a table. PAYMENT is a table and different payment methods need different information. The CASH class inherits from the PAYMENT class and adds additional fields Amount Tendered, Change Given. The VISA class inherits from the PAYMENT class and adds additional fields Card Number, Expiry date. These sub-classes can either live in separate tables or in the same table with a discriminator column and sparsely populated fields OR extension tables that share the same primary key as the PAYMENTS table and hold the additional fields only for that payment type.

    When you define these classes and virtual collections, .NET is smart enough to generate tables that match the class design and will include ID fields and FK constraints. Unless you tell it otherwise it will probably not use the most appropriate SQL data types (e.g. string properties are NVARCHAR(max) by default)

    The same design considerations and solutions apply in .NET as they do in database design. The tools in MVC, especially if you are using the Fluid API allow you to define the database structure to (almost) the same level of granularity as you can with the DML language in management studio. In fact in many ways it is better because you can combine data from multiple tables and give the classes (tables) and properties (fields) names that are relevant to the application logic but still maintain good database design in the DML naming.

    .NET MVC will generate migration code up and down for any database changes that arise from changing the data storage model of the entity classes. This code can be managed in source code control and run against downstream systems to produce robust upgrade plans.

    The challenge is that most DB developers have little or no experience in the application design and vice-versa which is why we have this disconnect and which I am trying to contribute in some small way to reduce and educate. Application designers do not appreciate the aggro they cause the DB team by implementing poor design patterns and we don't understand the complexity of designing the application class structures. Throw in abstraction to build out entityManager, entityStore and Repository patterns as it gets really complicated to visualise.

    Now that Microsoft have bought Xamarin, the cross platform development tools and made them free for everyone, rolled out SQL and .NET to Linux and introduces .NET core which will run on everything from a mainframe to a toaster any of you who are not aware of its capabilities and issues may be in for a rude shock in the not too distant future. I suspect for many fledgling developers they will only ever learn code-first database generation and will never even open management studio. There is not much that you can't do inside Visual Studio these days.

    I was hoping for some more constructive feedback. Maybe it is the SQL community that needs the education.:hehe:

  • OK.

    I might have been a bit harsh in that last post.

    We have all been the victims of poorly designed databases written by the app developers who don't even know that they are doing something wrong. I am trying to raise awareness of the issues caused by poor database design and how to NOT do it in .NET by leveraging the tooling available so that you can create databases your DBA will love.

    I think I will concentrate on data types and what indexes are and what they do for performance. If people want to pick my brains afterwards about the other topics I will spread the word then.

  • aaron.reese (10/20/2016)


    I think I will concentrate on data types and what indexes are and what they do for performance. If people want to pick my brains afterwards about the other topics I will spread the word then.

    Keep it tight, keep it focused, make vague hints about a 'part 2 on database design' if they sound interested in one. I don't know how much presenting you've done, but over-estimating what can be covered in a period of time is not unusual.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • aaron.reese (10/20/2016)


    I think I will concentrate on data types and what indexes are and what they do for performance. If people want to pick my brains afterwards about the other topics I will spread the word then.

    You know, it does not really matter which data types you use and how many indexes you throw on a denormalized table(s) - it would not sustain any reasonable load anyway.

    I've been watching enormous efforts thrown to improve performance of a system having a transactional table in its core with 19 "Status" columns (0/1), "Successful Statuses" and "Failed Statuses" columns with semicolon-separated strings like "Price Check OK;Customer Name OK", "Tax Amount Check Fail;Receipt Check Fail", - up to 19 checks per record between them two.

    The system became too slow at the stage of 100k rows in the table.

    But it was an old hardware, SQL2000, so there were a lot of blaming points.

    Redesigning the database never was made a priority by the management.

    Moving it to SQL2008 sitting on 48 CPU Server with 64GB of RAM allowed it to reach the level of 200k records.

    Then again 2 contractors were sitting there trying to make queries to complete within 2 minutes.

    I don't know the end of the story, but the last thing I heard the customers started moving to other providers.

    _____________
    Code for TallyGenerator

  • Aaron, I know how exactly you can make them listen.

    Go to the project management. Top level.

    Ask them what are the expectations of the number of customers, transactions per day, other scale measures, which need to be reached to make the project profitable.

    Then multiply the number of daily transactions by 2000 (5 years worth of data) and build a script populating the "code-first" database with the corresponding amount of mock-up data.

    While they're waiting for the application to pass the "welcome" screen you may populate an appropriately designed database with the same set of data.

    Make sure the management have noticed the difference in database response timing for both design versions.

    _____________
    Code for TallyGenerator

  • You should also tell them that LINQ is evil personified:-D (not really, but...)

    I have this discussion with developers from time to time, and have had to use Profiler to show them just how bad the SQL that LINQ generates can be, and then explain the benefits of not moving a bunch of data back and forth across the network as a sequence of LINQ statements executes, as opposed to having it all done in a stored procedure that keeps everything within the database.

  • I think the main problem with LINQ is that it fetches table data based on .NET classes so often will not limit the data to only those columns you are interested in, plus with the Lazy loading option it will potentially get all the data from the downstream tables as well.

  • @sergiy

    The problem is not with the code-first mentality; it is with a lack of awareness of what the defaults are when using code-first. I can build perfectly good 3NF tables using CF. In many ways CF is better than hand-rolled tables. By default the PK is called ID, the FK is called foreignTable_ID, table names are defaulted to plural or singular based on whatever you choose as the default, Clustered indexes are called IX_tableName, FK Check constraints are defined by default (so many DB devs don't do this when prototyping) and properly named, fields are NOT NULL by default unless you specify '?' on the end of the class property, values can be limited using an enumerator which requires a specific check constraint in SQL and going downstream to the presentation layer you can specify the default display names for the class properties which you can't do in SQL [this was one of the few nice features of Progress...]

    CF also generates database snapshots and migration scripts that can be deployed through the release cycle and because you CAN define seed data, it can be much easier to make iterative changes whilst still in the DEV stage.

    With CF you can rapidly prototype based on the business problem and then retro-fit the underlying data structure without having to touch the business logic or the entity classes.

  • aaron.reese (10/28/2016)


    ... and then retro-fit the underlying data structure ...

    I've not seen too many people that will actually do that. It's much easier just to do it right from the git.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply