• Thanks Ed! Fortunately this version of the application runs very well. Version 1 though was a complete disaster and choked at about 50 users. The problem was the application written in VS C# placed all of the business logic in the application. What was happening is that a single request for a form represented about 10 to fifteen hits to the database that choked the network, database, and connections. It didn't matter how many layers of server support we had or how many tiers we had - a fat transaction is a slow transaction.

    We moved all of the business logic to stored procedures and we've had predictable and excellent performance since. A single round trip to the database and it doesn't seem to matter if we call 1 stored procedure or a chain of fifteen we always get less than a 10 millisecond response time.

    Here's where the performance hit comes from and I don't hear much about it's the reason we want to scale. It's a remarkably consistent performance hit but here it is and you mentioned sharding but in our case sharding won't help. We have some large tables and complex stored procedures but as long as everything is properly indexed we get consistent good performance across the board against lightweight tables and large tables - it doesn't seem to matter. However, we did notice that some requests very consistently varied in response time than others and they shouldn't have - there was no apparent reason and in some cases they were very simple stored procedures or queries. And, when I say response time I'm talking about just the stored procedure not the front or back part of the transaction. What we found was the difference in the number of characters returned by the result set. If I run any of our stored procedures and govern the result set to 100 characters I'll get a 7 millisecond response time every single time. If I let the query through a substring function get 100,000 characters of data or text the response time jumps to 50 milliseconds. I can understand why the network wouldn't like that but SQL Server isn't doing anything with that data so it must be the time it takes to get it off the disks. We have a single routine that manages all of our form grids. We changed the number of records returned per page from 25 to 10 and the next day it was like nobody showed up to work and nobody complained. So, once again a fat transaction is a slow transaction.

    Most of our customers have bills with just a few pages but some have bills with hundreds of pages and they like to look at them all the time - that's why we want to scale out. We can very predictably based on the size of the anticipated result set farm those requests out to other nodes and we'll be good to go.

    I still have the problem with writes in stored procedures that run on read only nodes. Taking them out of the stored procedures and putting them into the application with a separate connection string seems like taking a step backwards. Do you have any thoughts on this issue? I'd like to not have to change any of the code or architecture but changing the syntax of the writes would be ok. I can't believe Microsoft is headed in a read only scale out option without providing an option for stored procedures to write back to the main node. Maybe it is there and I'm not seeing it. Thanks for your help!