Data Architecture Options

  • We have a requirement to have 0 downtime during Migrations, more like Active- Active.

    What options do we have on Microsoft SQL Server side ?  What restrictions do we have on Azure MI  Business Critical?

    basically scripts are applied  , every week as a part of Product Release and we need 0 down time, any one who did the R and D, please share, Application Team is also ready to change the Architecture, to micro services, and purpose built databases, like Cosmos DB.


  • First, not to be pedantic, but there are no 'zero downtime' systems. If you can define a realistic downtime tolerance (even if it is five 9s) you can review product documentation for products that achieve what you need and then review what their expected outages are for maintenance tasks like scale up/down/out/in to determine if you are better service by a more traditional statically resourced database service or an automatically scaling one like hyperscale or aurora.

    What do you mean by 'migrations?' Do you mean failover?

    Business Critical MI presents itself as a SQL server where the OS and availability features are managed for you by MS. You can manually fail them over for testing or troubleshooting reasons, but otherwise there shouldn't be anything you need to do. Periodically the instances will need to maintain themselves that will make the instance unavailable briefly, but apps can be written to gracefully handle that while the primary replica changes. The biggest problem they may face with maintenance seems to be long running transactions. If it has to, the instance it will kill long running transactions to complete maintenance - if your app uses a lot of those, any managed DB may be a problem.

    I think you are asking about sharding and if you need to implement sharding, you can do that in SQL, your developers just have to write it.

    Can your app tolerate rare but periodic data inconsistency or data loss? If not, that disqualifies NoSQL as an option, at least by itself.

    For something like this, your best design may be with microservices, but the database you use with microservices is immaterial to whether you choose to do that.

  • I assume by migrations, you mean SQL Scripts changing the schema/objects somehow.

    There is no zero downtime, but you can fake it. First, make sure all migrations are backwards compatible with app code. Use feature flags in the app code so that you can deploy changes, then turn them on.

    For changes, bind them in small transactions to prevent unforeseen issues.

    Trying to do everything at once is where you usually create downtime.

  • Thank you Steve for the response, The question is around database, can we have Active- Active, Looks like in SQL we do NOT have that option. say we need to Migrate 10 TB from one tenant ( Source- Live OLTP Transactions ) to another using SQL Azure MI Business critical ( Destination ), with 0 down time, what are the options?

    On the on prem days we have backup at source, restore with no recovery and last day, take a tail log backup at source , apply that at the destination AND restore with recovery, I believe on Azure MI Business critical, we cannot take on demand, backups  , as they are managed internally by Microsoft.  there will be a downtime to move data which we cannot have.

    Want to know options

  • Migrating a database from one server to another is what you're doing, so it's good to note that. Otherwise it's hard to understand.

    If you are moving to a different tennant, potentially the log replay service will work.

    With SQL 2022, they made it possible to restore backups to/from MI, so I'm guessing this might work for you, but I'm not sure. I'll ask the question of MS.

  • Another use case I need to provide solution

    There is a high profile customer who will send 3x data and wants 0 downtime,source is azure mi business critical, what options can we have at destination,?


  • I don't think any of us understand exactly what you are asking.

    I can say only one thing for sure: "zero downtime" is never a requirement, it is a fantasy.

  • OK, LRS probably won't work out of MI, but you can use MI link to migrate to a VM and then LRS to migrate from the VM to a new MI. That should be minimal downtime or interruption to clients

  • Thanks Steve , Its in preview and wont help as we need two stops . Primary 8 TB data cannot be taken down and its on Azure MI,

    need some solution to quickly move it from one Tenant to other with minimal downtime.

    Second question, to create a database solution for a customer who will grow fast , so which is better Azure MI or Azure VM or serverless?

  • I did get someone at MS looking into this to see if there's a better way. I'll let you know if I hear something.

    I'm guessing there's not good solution to move this. You could try to set up some replication between two instances and then cut over at some point with minimal downtime, but that would be a PIA and require lots of babysitting.


    Better is always it depends. Hyperscale might be an option, and MI BC is a good choice. Whether that's worth it or the fit is hard to tell.  The main thing might be you thinking about the tenant thing. That's kind of a big wall in Azure, so why are you moving this customer? Or could you move other customers/resources instead?


  • Thank you Steve for reaching out to Microsoft, Two separate issues we have ,

    the current OLTP DB with Transactions  is already at 5 TB , there are common tables used by the Application and by all the customers, without the data in key tables product wont work, and there have been performance issues like slowness every now and then , missing info , app/ web getting pegged sometimes and no one knows what in IIS suddenly causes this,  MI has been good on the SQL Server side, now we need to onboard a new customer who wants 100% uptime and will be submitting loads of data, so the business decision to give them a dedicated DB  , dedicated App server, for DB should we go to MI, or Azure SQL VM or Cosmos DB , in a short time..

    So I need to get all the options to the stake holders, so they can make a meaningful decision, one Top guy asked me  why we cannot have Active- Active, They have it in oracle, so I need to have all the info first.  Second some one said, why cant we move data for one customer at a time, I told them pulling 10000 records from 600 Million record table is insane, again th DB we pull from is the main bread and butter VLDB , transactions DB which cannot go down, Azure MI defintely kept uptime great!

  • second page bump

  • woops, jumped the gun on that

  • If you are already using MI, you are experiencing regular (short) maintenance outages. If you haven't noticed them, that may be a good spot to start looking for where you can tolerate outages.

    I don't understand where the idea for the jump from SQL Server/Oracle to CosmosDB is coming in. The two serve very different purposes.

    The question about Oracle "Active-Active" sounds like it is coming from meddlesome and obtuse upper management, and I feel for you. I have seen that implemented in one place that dazzled me, but major application support was required to use it reliably.

    SQL Server can sort of do 'active-active' with bidirectional-replication but it requires major application support if you want your data to be reliably consistent. Personally, with replication I have had it be completely reliable in some cases, but in the cases that it has not been reliable it was absolutely horrendous and I avoid it whenever I can.

    You mentioned completely re-architecting the app is on the table - have you considered event sourcing? If you implement your event store through a tenant to tenant VPN with replicas on both sides, then bring up a replica on the new side using LRS, deploy the app, subscribe it to the events then find a moment you can "freeze" the source SQL server momentarily to bring the new copy up to consistency, your other SQL server will keep in sync with the primary through messaging and then drain stop to cutover to the primary. Keep in mind saying all that is much lower effort than doing it.



  • Thanks for the response, what about Kafka and Azure Serverless / Hyperscale? just for POCs

    I don't think they will rearchitect the application to move out from SQL Server as of Feb 2024 and need some solution sooner to handle a high volume customer.

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply