Writing Nearly Codeless Apps: Part 5

  • Brian: Let me illustrate the benefits of the RAP approach with a scenario. Let's say you and I both implement the same application and we have the same user running both apps side by side. At the end of some time, our user has entered 1 million records into a particular table but has also deleted 400,000 of them, so that there are only 600,000 "current" records in the table.

    In the RAP application, when the user queries the table for current data, the server looks only in the "current" table, which contains only current records, so it has to search only through 600,000 records. In your application, the server has to look through 1 million records.

    In your application, every programmer writing every query into this table must remember to check for the "D" flag, and the server has to perform this test on every record in every query. The RAP application author (and server) does not need to check any flags - he just writes the natural query with no consideration of any audit or status fields.

    In the RAP application, I have stored every version of every record, including every version that was created by repeatedly updating the same record. This means that when querying historical data (as the RAP "ExampleCRM" app does when you log in using a date), the application can see a snapthot of all the data in the entire application as of any given date. For example if I want to see ALL of my data exactly as it was six months ago (including all screens, reports, etc.) all I do is specify a date of "six months ago" and it's as though I had logged into the app on that day. Nothing is ever lost. And since all these record versions are stored separately from the "current" data, there is no performance cost for storing them when looking up current data, no matter how many there are. In your app, the best you can ever do is see the latest version of each record, because that's the only version you've got; there is simply no way for you to reconstruct all your data as of a given past date.

  • David Ziffer (11/11/2010)


    Brian: Let me illustrate the benefits of the RAP approach with a scenario. Let's say you and I both implement the same application and we have the same user running both apps side by side. At the end of some time, our user has entered 1 million records into a particular table but has also deleted 400,000 of them, so that there are only 600,000 "current" records in the table.

    In the RAP application, when the user queries the table for current data, the server looks only in the "current" table, which contains only current records, so it has to search only through 600,000 records. In your application, the server has to look through 1 million records.

    In your application, every programmer writing every query into this table must remember to check for the "D" flag, and the server has to perform this test on every record in every query. The RAP application author (and server) does not need to check any flags - he just writes the natural query with no consideration of any audit or status fields.

    In the RAP application, I have stored every version of every record, including every version that was created by repeatedly updating the same record. This means that when querying historical data (as the RAP "ExampleCRM" app does when you log in using a date), the application can see a snapthot of all the data in the entire application as of any given date. For example if I want to see ALL of my data exactly as it was six months ago (including all screens, reports, etc.) all I do is specify a date of "six months ago" and it's as though I had logged into the app on that day. Nothing is ever lost. And since all these record versions are stored separately from the "current" data, there is no performance cost for storing them when looking up current data, no matter how many there are. In your app, the best you can ever do is see the latest version of each record, because that's the only version you've got; there is simply no way for you to reconstruct all your data as of a given past date.

    You are really missing some basic database concepts. You have a view for Current_Users or Active_Purchases or whatever, and 98% of the queries use those views. The 2% of the time you need to do something weird, like undelete, or report on 'all including deleted' you query the underlying table. That is for current data, soft deleted or otherwise. If you are starting from scratch, you make it an indexed view and suffer literally zero performance hit. We had huge tables before that was an option, so we don't have indexed views, but we still perform just fine.

    Historical data is not stored in the main table. I was quite clear about this above. It's in a separate table, named by convention <table>_hist or something like that. The hist tables are populated by triggers from the main tables. Every insert/update/delete(when permitted) is recorded. If I want a user and group and purchase list as of 6 months ago, I query the _hist tables, joined to the other _hist tables. I have one set of queries for current data, and one set of queries for historical data. You have a huge number of table valued functions each of which does an if/else to abstract away from the end developer the idea of current versus historical. The difference is that I can index my historical/current data and tune my current/historical queries way differently.

    If your database performance is significantly affected by storing 400,000 records, then there is probably something wrong. We have in the tens of millions of user records, and keeping the ones that are no longer active is just not a performance hit. We also soft-delete to avoid cascading deletes. What do you do about them? What if a user has been responsible for 10s of thousands of transactions over the years, all FK'd to them. Each of which have tens of detail records hanging off of them. When you physically delete that user, how many rows have to be deleted? Easily hundreds of thousands. For one user. What if you delete a group of 10,000? You're talking millions and millions of cascading deletes. Your entire database is in jeopardy at this point, as rowlocks escalate to pagelocks, escalate to table locks, and writes and transactions start to pile up to deadly levels as this one delete uses up all your I/O. While in my system, soft deleting a user affects... two rows - one to update the current table, one insert into the hist table. Soft Deleting 10,000 affects 20,000 rows. It is a linear progression. Cascading deletes are a logarithmic progression. I will take a gradual creep in table size over huge unpredicatable I/O thrashing any day.

  • I was wondering what the benefits were of having two separate functions for potentially returning all records if the primary key parameter is passed in as NULL, like:

    UFTBadmUser##PK (@AsOfDate, @userid)

    UFTBadmUser#PK (@AsOfDate, @userid)

    Why not just have the one function that uses

    where (UserID = @user-id or @user-id is null)?

    It can't be that much of a performance hit to check whether or not a parameter's value is null and why would you even have a @user-id parameter in the second function at all if the only time you would use it would be if you wanted to pass @user-id = null?

  • omarserenity (11/13/2010)


    I was wondering what the benefits were of having two separate functions for potentially returning all records if the primary key parameter is passed in as NULL, like:

    UFTBadmUser##PK (@AsOfDate, @userid)

    UFTBadmUser#PK (@AsOfDate, @userid)

    Why not just have the one function that uses

    where (UserID = @user-id or @user-id is null)?

    It can't be that much of a performance hit to check whether or not a parameter's value is null and why would you even have a @user-id parameter in the second function at all if the only time you would use it would be if you wanted to pass @user-id = null?

    Actually there is quite a performance hit that occurs when you do the test for null. So there are two versions: one that allows you to get all records if you want, and a much faster one that requires an exact match on the key value(s).

  • It can't be that much of a performance hit to check whether or not a parameter's value is null ...

    I can testify to the potential hit on checking for a parameter's NULL value. I once had a stored proc bring an entire server to its knees. Making one change - turning one proc into three procs that did not have to check for NULL values - instantly and completely fixed all problems.

    You will not see this problem for checking for nulls all the time. There are times when checking for a null value makes sense. You should just be aware that it is a potential gotcha--especially since the problem may not be apparent in testing.

  • When will Part 6 appear?

Viewing 6 posts - 16 through 20 (of 20 total)

You must be logged in to reply to this topic. Login to reply