Writing Nearly Codeless Apps: Part 4

  • RobertYoung (10/29/2010)


    ron.buchanan.us (10/29/2010)


    careusa2003 (10/29/2010)


    I would not touch this design with a 10-foot pole.

    If you are going to insult a design of which you know little about, at least elaborate. I would love to know what it is that you find so horrible about the design.

    The only vector which makes sense: s/he's a client-side coder who lives by the mantra "we prefer to enforce constraints in our application code". Known lots o such folks. They make messes worse than a puppy. :w00t:

    Or they have a terabyte+ database where the important procs need to be specifically tuned to be fast enough. RAP would replace your laboriously tuned proc with a vanilla CRUD one as soon as you add a column to the schema. That is my number one problem with all code generation systems or ORM systems, they just would never perform well in a really large environment. To go back the Industrial Revolution type metaphors... robots today do a lot of the assembly work on a Ford Focus. But how much of the space shuttle was assembled by robots? The really complex still requires skilled human assemblers.

    How often do RAP procs make use of CTEs? How often does RAP created indexed views? Does RAP ever use dynamic sql for when you want to shift indexes based on input parameters? How does RAP deal with distributed inserts where a chain of dependent inserts into tables all have to succeed or fail as one? When will RAP de-normalize for performance? I don't know of any tool that would do that, and where I work those are all needed.

  • Or they have a terabyte+ database where the important procs need to be specifically tuned to be fast enough. RAP would replace your laboriously tuned proc with a vanilla CRUD one as soon as you add a column to the schema. That is my number one problem with all code generation systems or ORM systems, they just would never perform well in a really large environment. To go back the Industrial Revolution type metaphors... robots today do a lot of the assembly work on a Ford Focus. But how much of the space shuttle was assembled by robots? The really complex still requires skilled human assemblers.

    How often do RAP procs make use of CTEs? How often does RAP created indexed views? Does RAP ever use dynamic sql for when you want to shift indexes based on input parameters? How does RAP deal with distributed inserts where a chain of dependent inserts into tables all have to succeed or fail as one? When will RAP de-normalize for performance? I don't know of any tool that would do that, and where I work those are all needed.

    That is a valid point. However, just because there is a generated ORM, and one that reacts to the metadata of the DB, doesn't mean it is a poor choice for the administration portion of an application or for the simple CRUD. That, IMO, is the very purpose of using an ORM. Now, to have a facade or adapter layer between the custom application and any third-party, again IMO, is good design.

    We house large amounts of data. However, we also have an ORM sitting on top of that data. The difference is we are smart about how we query the data, the structure of the database, and what we use the ORM for. We never use the GetAll() methods, we only use the GetPaged() or GetByKey() methods. These are no different than doing a stored proc that is optimized because we have optimized the SQL ourselves. We can also customize the SQL for the queries, or just call a stored procedure; we are not bound by the the ORM, it is flexible.

    Because we are an enterprise environment, we took a while to make the decisions we made. We ran performance tests with different ORMs, and we also have designed the DBs in a way to serve their specific purpose. For instance, we have a reporting database that is denormalized and population occurs on job schedules with SSIS. We have the raw data where the data can be edited that feeds the reporting DB. We also have SSIS jobs that run regularly to feed data into the raw DBs from outside sources.

    Now, our ORM is NetTiers and not RAP. I wanted to point that out because you directly addressed RAP.

  • wbrianwhite (10/29/2010)


    RobertYoung (10/29/2010)


    ron.buchanan.us (10/29/2010)


    careusa2003 (10/29/2010)


    I would not touch this design with a 10-foot pole.

    If you are going to insult a design of which you know little about, at least elaborate. I would love to know what it is that you find so horrible about the design.

    The only vector which makes sense: s/he's a client-side coder who lives by the mantra "we prefer to enforce constraints in our application code". Known lots o such folks. They make messes worse than a puppy. :w00t:

    Or they have a terabyte+ database where the important procs need to be specifically tuned to be fast enough. RAP would replace your laboriously tuned proc with a vanilla CRUD one as soon as you add a column to the schema. That is my number one problem with all code generation systems or ORM systems, they just would never perform well in a really large environment. To go back the Industrial Revolution type metaphors... robots today do a lot of the assembly work on a Ford Focus. But how much of the space shuttle was assembled by robots? The really complex still requires skilled human assemblers.

    How often do RAP procs make use of CTEs? How often does RAP created indexed views? Does RAP ever use dynamic sql for when you want to shift indexes based on input parameters? How does RAP deal with distributed inserts where a chain of dependent inserts into tables all have to succeed or fail as one? When will RAP de-normalize for performance? I don't know of any tool that would do that, and where I work those are all needed.

    And do you have terabyte+ data because someone never bothered with defining normalized data? Are you doing text processing? There is no de-normalize for performance. It doesn't require much skill to do a bytedump into a database. Been there, seen that. Got fired a few times for complaining; that's OK, who wants to work with knuckleheads? BCNF datastores can be implemented efficiently either on arrays of short strokes or SSDs; and yields maximum global efficiency (coders tend to say things like, "my query ran faster in [MySql|VSAM|dBaseII] than SQLServer", blissfully ignoring all that client side manipulation and ACID services that MySql doesn't provide) since the client code does so much less. BCNF also gives you maximum flexibility, the xml folks and their "hierarchical model" notwithstanding; adding columns/tables has so few side effects compared to doing the same in xml. The fact that you *need* lots o sql gyrations is prima fascia evidence that the schema is wonky.

    As Phil (not Factor) used to say: "in the old days it was simple data and complex code, now it's complex data and simple code". What he meant by "complex data" was fully normal. If one wishes to continue to create applications of bloated code manipulating simple as dirt bytedumps, then fine. But that approach is not a database approach; one may use some sql along the way, but it still isn't an example of the Relational Model incarnate.

    All of the other techniques mentioned can built into a generator; whether RAP ever does is a separate question. Don't conflate RAP, per se, with generation as a technique.

    I'm not asserting that RAP is either perfect or the generation implementation I would build were I to do so. But it is fundamentally the correct Yellow Brick Road.

  • RobertYoung (10/29/2010)


    And do you have terabyte+ data because someone never bothered with defining normalized data? Are you doing text processing? There is no de-normalize for performance. It doesn't require much skill to do a bytedump into a database. Been there, seen that. Got fired a few times for complaining; that's OK, who wants to work with knuckleheads? ... The fact that you *need* lots o sql gyrations is prima fascia evidence that the schema is wonky.

    As Phil (not Factor) used to say: "in the old days it was simple data and complex code, now it's complex data and simple code". What he meant by "complex data" was fully normal. If one wishes to continue to create applications of bloated code manipulating simple as dirt bytedumps, then fine. But that approach is not a database approach; one may use some sql along the way, but it still isn't an example of the Relational Model incarnate.

    Thank you, thank you RobertYoung for supporting this position. I was beginning to think I was crazy for being almost the only person I know who subscribes to it.

    I have never EVER seen a low-normalization database that wasn't a disaster. Not only that, I have never met people who maintained such a database who actually realized what a mess their data was in. Which of course explains how it is that they can tolerate the mess and the endless problems it creates (they actually think they're doing the best that can be done).

    Denormalizing a normalized database is trivial. Maintaining a denormalized database is, simply, impossible. And if you think that you're reasonably maintaining one, might I suggest that you're in that most hopeless category of people: those who don't even know what it is that they don't know.

    Apparently, Father [still] Knows Best.

  • And do you have terabyte+ data because someone never bothered with defining normalized data? Are you doing text processing? There is no de-normalize for performance. It doesn't require much skill to do a bytedump into a database. Been there, seen that. Got fired a few times for complaining; that's OK, who wants to work with knuckleheads? BCNF datastores can be implemented efficiently either on arrays of short strokes or SSDs; and yields maximum global efficiency (coders tend to say things like, "my query ran faster in [MySql|VSAM|dBaseII] than SQLServer", blissfully ignoring all that client side manipulation and ACID services that MySql doesn't provide) since the client code does so much less. BCNF also gives you maximum flexibility, the xml folks and their "hierarchical model" notwithstanding; adding columns/tables has so few side effects compared to doing the same in xml. The fact that you *need* lots o sql gyrations is prima fascia evidence that the schema is wonky.

    As Phil (not Factor) used to say: "in the old days it was simple data and complex code, now it's complex data and simple code". What he meant by "complex data" was fully normal. If one wishes to continue to create applications of bloated code manipulating simple as dirt bytedumps, then fine. But that approach is not a database approach; one may use some sql along the way, but it still isn't an example of the Relational Model incarnate.

    All of the other techniques mentioned can built into a generator; whether RAP ever does is a separate question. Don't conflate RAP, per se, with generation as a technique.

    I'm not asserting that RAP is either perfect or the generation implementation I would build were I to do so. But it is fundamentally the correct Yellow Brick Road.

    No, we have lots of data because there are millions and millions of transactions and users in a single database. We're in the process of moving large chunks of data into other database servers grouped by customer. But our main database is large because there's lots of data, not because it's poorly designed.

    "There is no de-normalize for performance."

    Of course there is. If you need to limit a select to only rows relevant to a particular company the query is much more efficient if the company id is just stored on those rows, instead of having to join through four or five tables to get back to company id if it was fully normalized. You can index the company field and then your query just does a simple index seek.

  • David Ziffer (10/29/2010)


    RobertYoung (10/29/2010)


    And do you have terabyte+ data because someone never bothered with defining normalized data? Are you doing text processing? There is no de-normalize for performance. It doesn't require much skill to do a bytedump into a database. Been there, seen that. Got fired a few times for complaining; that's OK, who wants to work with knuckleheads? ... The fact that you *need* lots o sql gyrations is prima fascia evidence that the schema is wonky.

    As Phil (not Factor) used to say: "in the old days it was simple data and complex code, now it's complex data and simple code". What he meant by "complex data" was fully normal. If one wishes to continue to create applications of bloated code manipulating simple as dirt bytedumps, then fine. But that approach is not a database approach; one may use some sql along the way, but it still isn't an example of the Relational Model incarnate.

    Thank you, thank you RobertYoung for supporting this position. I was beginning to think I was crazy for being almost the only person I know who subscribes to it.

    I have never EVER seen a low-normalization database that wasn't a disaster. Not only that, I have never met people who maintained such a database who actually realized what a mess their data was in. Which of course explains how it is that they can tolerate the mess and the endless problems it creates (they actually think they're doing the best that can be done).

    Denormalizing a normalized database is trivial. Maintaining a denormalized database is, simply, impossible. And if you think that you're reasonably maintaining one, might I suggest that you're in that most hopeless category of people: those who don't even know what it is that they don't know.

    Apparently, Father [still] Knows Best.

    By denormalized I just meant storing company id on relevant rows, when company id can never change on a transaction. It's denormalized, but it's not a maintenance issue. The performance boost is completely worth it.

  • RobertYoung (10/29/2010)


    ron.buchanan.us (10/29/2010)


    careusa2003 (10/29/2010)


    I would not touch this design with a 10-foot pole.

    If you are going to insult a design of which you know little about, at least elaborate. I would love to know what it is that you find so horrible about the design.

    The only vector which makes sense: s/he's a client-side coder who lives by the mantra "we prefer to enforce constraints in our application code". Known lots o such folks. They make messes worse than a puppy. :w00t:

    And now insults are simply being traded. Stop this crap folks. If you're going to claim that you wouldn't "touch this design with a 10-foot pole" or that something will "make messes worse than a puppy", back up your claim with code, links to white papers, or write a good explanation as to why. Just stop with the non-substantive attacks. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • @david-2 Ziffer

    I have a question based on the following snippet from the article...

    The reader may question why the application should pass the audit date at all; why not just have each insertion routine internally get the current date from the database and put that value in this field? The answer has to do with auditing. The reason we store the AuditDate at all is to retrieve historical data from the archive tables. [font="Arial Black"]If a set of records is being written (added, updated, deleted) as a group, then we need to be able to retrieve them as a group.[/font] In order to ensure that all records written together are also retrieved together, they must all have precisely the same AuditDate. Therefore we cannot leave it to the individual insertion routines (for each table) to each supply AuditDates on the fly. Instead, the application must generate a single date to be used on absolutely every record involved in a given transaction.

    First, I whole-heartedly agree with the principle of storing the same AuditDate when adding a "group" of rows for the very reason stated. To wit, my question would be, why would you add a group of rows one row at a time instead of loading them as a complete set with a single insert?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (10/30/2010)


    @David Ziffer

    I have a question based on the following snippet from the article...

    First, I whole-heartedly agree with the principle of storing the same AuditDate when adding a "group" of rows for the very reason stated. To wit, my question would be, why would you add a group of rows one row at a time instead of loading them as a complete set with a single insert?

    I would think that's all that an automated code generator type tool can do, even though it is not the right approach from a database perspective. Inserting as a set would be better in every way: atomic, can be controlled by a transaction with commit/rollback logic, the whole set will go in at once meaning for the batch of data being inserted it will be much less intensive for transaction logging, row/page/schema locking, you can select from two tables joined together into 2 other tables instead of copying one row, looking up a new id, copying the depndent row, and probably 10 other things that don't come to mind right now.

    Which is why I find the insistence that doing it the tool's way is the only way. I've never worked with Ruby on Rails, but from every serious developer I've read who's written about it, the Rails toolkit is enough to get you going, but you eventually end up with custom code when you need to address real world complex problems. Losing the ability to regenerate your code base all over, but gaining the rather important ability to meet your customers' needs.

  • All of the other techniques mentioned can built into a generator

    I have a hard time believing this. How would a code generator know enough about your application structure, data size, and database performance to 'decide' now is the time to use an indexed view? CTEs can sometimes be used for performance, if you know which indexes are out there and are analyzing query plans of different approaches for a proc. Otherwise they're useful for code maintainability, which wouldn't be an issue in generated code.

    I have seen several systems come out that can build up medium/medium-low complexity applications automatically. But none that would work for a large software as a service type company. I think in the attempt the company would spend most of their time developing the tool to do what they want, making it ever more customized to a specific domain, and making the dev work be at one or two removes from where it really needs to be. To include indexed views you'd need to program an entire framework for the concept, which seems quite difficult in a CRUD oriented system, verify it doesn't mess anything else up in the whole generator, and then you'd still need someone inputting data about your load and data size and all the other reasons that would lead you to use an indexed view, each time that you want such a view. Which wouldn't end up cutting much time off the process of adding one.

  • Jeff Moden (10/30/2010)


    First, I whole-heartedly agree with the principle of storing the same AuditDate when adding a "group" of rows for the very reason stated. To wit, my question would be, why would you add a group of rows one row at a time instead of loading them as a complete set with a single insert?

    I certainly would load rows into a single table as a group, but even those insertions must be done through individual calls to the RAP insertion procedure. So even though they are queued and streamed, they are still done with individual stored procedure calls. Even if they were all inserted with a single statement, it is not clear that a "datetime" performed for each row would produce an identical value in each row.

    Then of course there is the fact that most writes to the database do not involve merely a single table. If I write a parent and child record together and I expect to retreive them together, they'd better have the same AuditDate on them.

  • wbrianwhite (10/30/2010)


    All of the other techniques mentioned can built into a generator

    I have a hard time believing this. How would a code generator know enough about your application structure, data size, and database performance to 'decide' now is the time to use an indexed view? CTEs can sometimes be used for performance, if you know which indexes are out there and are analyzing query plans of different approaches for a proc. Otherwise they're useful for code maintainability, which wouldn't be an issue in generated code.

    The same way humans do: recognize the pattern. For CTE's, the most useful use is in traversing hierarchies in adjacency list form. Easy enough to recognize.

  • RobertYoung (10/31/2010)


    wbrianwhite (10/30/2010)


    All of the other techniques mentioned can built into a generator

    I have a hard time believing this. How would a code generator know enough about your application structure, data size, and database performance to 'decide' now is the time to use an indexed view? CTEs can sometimes be used for performance, if you know which indexes are out there and are analyzing query plans of different approaches for a proc. Otherwise they're useful for code maintainability, which wouldn't be an issue in generated code.

    The same way humans do: recognize the pattern. For CTE's, the most useful use is in traversing hierarchies in adjacency list form. Easy enough to recognize.

    Robert is right on this one. That's why we focus on the database design, which also considers the ORM that we generate. We know that unique constraints, indexes, and foreign keys will all be seen by the ORM generator, and so optimizing the retrieval process. Like I said, we don't look for the ORM to do anything but generate the plumbing to get the data in and out of the database and into objects, nothing more. The trick with the ORM, and especially generated ORMs, is to know which ORM does what and every extent of their capabilites, developer productivity increase, etc. Expecting the ORM generator to do everything for you, I believe, is naive.

  • David Ziffer (10/30/2010)


    Even if they were all inserted with a single statement, it is not clear that a "datetime" performed for each row would produce an identical value in each row.

    Datetime is only calculated once per query so no worries there.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • David Ziffer (10/30/2010)


    I certainly would load rows into a single table as a group, but even those insertions must be done through individual calls to the RAP insertion procedure.

    I'm not trying to be a smart guy here. Just an opinion on my part. What you've stated above is part of the reason why I don't use such procedures. If you need to insert a "group" of rows, it just shouldn't be done one row at a time. As with anything else, "It Depends" but, comparatively speaking, it's much too resource intensive to insert groups of rows in a RBAR fashion. It would be really cool if your product knew how to receive and insert more than one row in a set based fashion.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 31 through 45 (of 59 total)

You must be logged in to reply to this topic. Login to reply