Replication Architecture for a Web Based Business Application

  • I’ve been tasked to create horizontal scaling for our web based business application.

    Here’s what I got for my development environment to roll into production when complete.

    1.5 IP Addresses.

    2.10 Medium Sized Servers, 1 Large Server, 11 Servers Total.

    3.VS 2012, SQL Server 2014, IIS 6.2, Windows Server 2012R2.

    The database and all the code is clean/indexed, we’re just at that point where we need to spread the load horizontally. We decided to keep things simple and not install (other than network) load balancing strategies and want to keep things simple and at the transaction level and our disk folks are already doing a good job. The application is mostly stored procedures with very little .Net C# code, basically .Net is just used for the I/O between IIS and SQL Server. The database is heavily segmented into separate databases. In other words there is a database for accounting, security, customers, employees, inventory, etc. or a total of about 50. When I first saw it I got scared but the performance is great with the varying disk options per database and it’s nice not having to back-up data or databases every day that don’t need to be backed up. We don’t even use a fully qualified name, in stored procedures we just use [database].[schema].[object]. Here’s our thinking:

    1.1 IP Address and 1 IIS Server for home page requests.

    2.4 IP Addresses and 4 IIS Servers to serve subsequent page requests. A routine has already been created to load balance the requests. In other words the HTML is created dynamically and one client might be sent to URL http://server1.example.com/fom1 and another client to http://server3.example.com/form1 or two different servers for the same request pointing to separate database subscribers.

    3.1 server for the SQL Server publisher of all databases.

    4.5 servers for the SQL Server subscribers.

    5.For future growth per stack 1 IP Address, 1 IIS Server, 1 Subscriber Server, a tweak to the routine that farms out URL’s, and a new subscription push or pull.

    Here’s the problem and it’s a big one. All the stored procedures although none of them do any heavy inserts, updates, or deletes all do at least a few logging requests, changing a customer name, etc. I spent a couple hours browsing and reading but I couldn’t find a solid answer but here’s my question. If a request comes in to IIS Server 2 that runs a stored procedure on Subscriber Server 2 how is data updated on the publisher? I understand subscribers are for read only (good thing) unless you license the enterprise version with peer to peer but I just want to make sure I have it right. The only workaround would be to create a linked server back to the publisher for each subscriber and change the t-sql syntax to accommodate the linked server requirements. Is that right?

    If you’re familiar with this issue please let me know. Any help is greatly appreciated. Thanks

  • Both transactional and merge replication support updates that are sent to all other databases involved with the publication.

    For you, I think that transactional replication (with immediate updating subscribers) will suit your requirements

  • HAPPYCAT59 - Thanks for the reply! I started to go there but then I read at http://technet.microsoft.com/en-us/library/ms151718.aspx - Updatable Subscriptions for Transactional Replication - "This feature will be removed in a future version of Microsoft SQL Server. Avoid using this feature in new development work, and plan to modify applications that currently use this feature.".

    So, I'm trying to figure out where Microsoft is headed with replication. What do you think? The big concern is:

    What is today's best practice for a stored procedure that runs on a subscriber that needs to do an insert, update, or delete.

    Thanks for your help.

    PS - I tried to find some information on why Microsoft is deprecating "Updating Subscriptions for Transactional Publications" and couldn't find anything.

  • We have used replication for many years and have followed what Microsoft are doing with each SQL release with a lot of interest. Although P2P replication has been introduced in SQL 2005 and improved through to SQL 2012, our view is that Always On is now the strategic environment for DR and scale-out.

    We currently use P2P to keep a master copy of our data in two locations, and use transactional replication from each P2P node to deal with scale-out. All of our applications have separate connection strings for read and write activity, which allows us to direct all writes to a P2P node and take the reads from a scale-out server. (Certain items such as profile updates need to be read and written from the P2P nodes so the user is guarenteed to see their latest update.)

    In the past we did use updating subscribers, but moved away from this both as part of moving to P2P and to reduce risk as we had more reliability problems with the updating subscriptions than standard subscriptions.

    Historically we have only needed a subset of our main tables to pre present on each scale-out server, and we have also used custom replication procedures to optimise certain tables for the scale-out environment. We are now moving to a common schema across all of our P2P and scale-out servers, which will allows us to move to Always On for scale-out and potentially for everything.

    You need to plan your replication topology and your database update approach in parallel. It is easiest with SQL Server to have a (writable) publisher and (read-only) subscribers, regardless of if you use transactional replication or Always On. If your applications are not designed to cope with this then you will add a lot to the complexity and risks of your operational environment. At the end of the day, you assess your risks and do what is best for your business.

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

  • Ed - Thank you very much for the feedback!

    I totally hear what you're saying and don't want to travel down the road that results in issues.

    Always On and a Separate Connection String for writes sounds like the way to go.

    Unfortunately, that involves going through hundreds of stored procedures and developing a strategy to deal with the ones with writes. Our c# app is only one default.aspx page that calls one main stored procedure. The one .aspx page and one main stored procedure services an application with over 500 forms, mostly maintenance forms, 500 tables, and another 1000 stored procedures driven by actions submitted from page links. In other words a user clicks on a link to view a customer's bill. The link includes an action and the parameters needed. The main procedure based on the action calls a chain of other procedures that produces the content to view the customer's bill. In this example several writes occur indicating mostly who viewed the bill, why, and when in addition to security and MIS related logging. It's actually one of the more robust and maintenance free applications I've worked on and the assumption was publisher with updateable subscriber was there when needed.

    The only thing that comes to mind is every action needs to be replicated in .net with a call for the views and a second call for the writes. It's easy to strip logging out of the stored procedures but it's months of work and testing to deal with everything else. It involves tearing the stored procedures apart, creating new ones for the writes and adding the calls to .net.

    The other option that comes to mind is adding a linked server to each subscriber back to the main node but I'm not sure about the performance hit. Changing the stored procedures to accommodate the linked server would be less work and represent a lower rate of coding errors but I don't know.

    If you have a moment and want to share any thoughts please let me know. Thanks again for the feedback.

  • On the face of it, your application was designed to cope with a given volume and your business is now exceeding the design volume of the application.

    My advice is that your management ask for a report on the scalability of your application, and ask it to assess the infrastructure needed if the business doubled in size over the next 3 years. This should identify the pain points you will encounter, and lead to a strategy to resolve them. If you keep with your custom app you may be faced with a large re-write process - which as far as adding value to the busines is concerned is just running up cost only to stand still.

    One short-term option is to look at sharding. Typically this involves looking at a DBMS that is designed to be sharded, but any move like this is likely to cost similar to a total rewrite. An alternative is to shard your customer base.

    If you work out what size of customer base your application can cope with and still be reliable and cheap to run, you divide this into your current or projected customer base and get the number of instances of the application you need. You will then need to write some new front-end that can get key customer information and direct the rest of the work to the appropriate instance of your application.

    Whatever option you choose thre will be complexity, risk and cost. However, if you as a techie are seeing that the application is getting near the limits of its scalability, then you need to tell management and they need to get some work commissioned to find out the true extent of the problem.

    If your business does not address the underlying scalability issues then you are into sticking band-aids over the wounds. In the same way that a bank is now just an IT shop that also lends money and a supermarket is an IT shop that also distributes food, your business relies and totally depends on working IT in order to survive. There is a history of failed companies that relied on sweating the assets a bit too far rather than upgrading the technology.

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

  • BTW, I did not explain why we think Always On is the strategic direction... The two reasons are simplicity and SQL 2014.

    1) Always On is inherently less work to set up and manage than transactional replication. Therefore it should always be the preferred option for any new data distribution scenario if the other chioce is any form of transactional replication.

    2) The new in-memory and column-store facilities introduced with SQL 2014 work with Always On for data distribution, but do not work with transactional replication. The new SQL 2014 features will give massive performance improvements to many workloads, but there will still be a need to scale-out to deal with workload volume - and the only scale-out option with the new stuff is Always On.

    Although we have no insight to Microsoft future plans, we would not be surprised if Always On gets more enhancements while replication withers. We have seen the start of this in that hetrogeneous replication was dropped from SQL Server over the last 2 releases, while Always On was enhanced to support up to 8 read-only secondaries.

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

  • Thanks Ed! Fortunately this version of the application runs very well. Version 1 though was a complete disaster and choked at about 50 users. The problem was the application written in VS C# placed all of the business logic in the application. What was happening is that a single request for a form represented about 10 to fifteen hits to the database that choked the network, database, and connections. It didn't matter how many layers of server support we had or how many tiers we had - a fat transaction is a slow transaction.

    We moved all of the business logic to stored procedures and we've had predictable and excellent performance since. A single round trip to the database and it doesn't seem to matter if we call 1 stored procedure or a chain of fifteen we always get less than a 10 millisecond response time.

    Here's where the performance hit comes from and I don't hear much about it's the reason we want to scale. It's a remarkably consistent performance hit but here it is and you mentioned sharding but in our case sharding won't help. We have some large tables and complex stored procedures but as long as everything is properly indexed we get consistent good performance across the board against lightweight tables and large tables - it doesn't seem to matter. However, we did notice that some requests very consistently varied in response time than others and they shouldn't have - there was no apparent reason and in some cases they were very simple stored procedures or queries. And, when I say response time I'm talking about just the stored procedure not the front or back part of the transaction. What we found was the difference in the number of characters returned by the result set. If I run any of our stored procedures and govern the result set to 100 characters I'll get a 7 millisecond response time every single time. If I let the query through a substring function get 100,000 characters of data or text the response time jumps to 50 milliseconds. I can understand why the network wouldn't like that but SQL Server isn't doing anything with that data so it must be the time it takes to get it off the disks. We have a single routine that manages all of our form grids. We changed the number of records returned per page from 25 to 10 and the next day it was like nobody showed up to work and nobody complained. So, once again a fat transaction is a slow transaction.

    Most of our customers have bills with just a few pages but some have bills with hundreds of pages and they like to look at them all the time - that's why we want to scale out. We can very predictably based on the size of the anticipated result set farm those requests out to other nodes and we'll be good to go.

    I still have the problem with writes in stored procedures that run on read only nodes. Taking them out of the stored procedures and putting them into the application with a separate connection string seems like taking a step backwards. Do you have any thoughts on this issue? I'd like to not have to change any of the code or architecture but changing the syntax of the writes would be ok. I can't believe Microsoft is headed in a read only scale out option without providing an option for stored procedures to write back to the main node. Maybe it is there and I'm not seeing it. Thanks for your help!

  • If 98% of your customers have short-ish bills and are fine and 2% have long bills and have performance issues, maybe the simplest answer is to do some optimisation specifically for the 2% customers.

    You could look at creating a table or tables optimised for long bills (or maybe an indexed view...) and look how you get the performance down for this group of customers.

    It is then a fairly easy job to have an overnight or weekly procedure that finds the Customer Ids for the problem customers and puts them into a lookup table. Then add some code that uses the small or large customer logic depending on what is found in the lookup table.

    Regarding hoping that Microsoft will make it easy to use a writeable hub with read-only spokes for scale-out, I don't think it will happen.

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

  • Hi Ed,

    I was diverted and sorry I didn't get back sooner but I just wanted to say I appreciate all you help and like your suggestion.

    Thanks again!

    Mark

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply