Amazon Relational Database Service (RDS) allows us quickly deploy databases without worrying about underlying infrastructure. These pleasant words that have thrilled the business world and trickled through the industry. AWS, Azure, OCI and other cloud service providers all have these DBaaS offerings. Economies of scale implies that adopting cloud should be inexpensive. In practice, however, we find that this is not always the case for existing, non-cloud-native, large enterprises. One must take into account quite a few aspects before deploying/migrating a database in the cloud. This article explores some of those considerations. We also demonstrate how decisions can be made while deploying a single SQL Server Database using Amazon RDS.
10 Considerations for RDS
In the context of this article and based on both research and our experience, here are a few elements one must consider when deploying databases in the cloud:
- Overall Cost
- Data Residency
- Data Size
- Backup and Recovery
- Data Retention
- Data Life Cycle
- Data Analytics
- Data Access
- Data Security
- Data Privacy
Let’s break these down one after the other.
Total Cost of Ownership
The low cost of cloud services and the OPEX model were key selling points when Amazon Web Services (AWS) pioneered the model cloud computing paradigm. This "low cost" was sold to startups mostly. They didn't have to make heavy investments in IT infrastructure on-premises when launching new products. AWS gave many startups the advantage they needed to get started with their business and scale along the way. Most of the startups who leveraged cloud computing in the early days were fintech companies which offered scalable products to the masses such as payment gateways. It was profitable to start small, scale and remain in the cloud - the product paid its own rent. This is slightly different for large enterprises.
In my view, the workload being moved to the cloud should be good enough to pay its own rent. For large enterprises who waste resources on-premises, cloud computing has tendency of exposing the wasteful practices in graphic detail. Suddenly every bit of storage has to be paid for (pun intended!). And as you scale or provision, the money just keeps showing up in your face thanks to Amazon Cost Explorer and other tools made available for visualizing the costs of assets deployed in the cloud. In addition, infrastructure must been sized optimally for the purpose it is being provisioned.
The cost of cloud computing may seem attractive in the short term. Careful planning is necessary to remain in the cloud for the long hall. Paula Rooney in a July 2022 article in CIO.com explores this topic in further detail. The inadequacy of traditional cost-control mechanisms used on-premises, the need to modernize before migrating and the potential hidden costs are all highlighted.
Some regulators are very concerned about where their citizens' data sits. The haziness of definitions in the information technology world makes it easy in some jurisdictions to fool regulators. Is there an efficient way for regulators to determine where data sits and where services are being delivered from? Big question. As an organization, is important to carefully understand the regulator’s requirements and align. This avoids sanctions but one must also take the organization's interests and operating model into account.
For example, some multi-nationals operate a centralized business model meaning they deliver services from a single location to their customers across boundaries. Navigating the regulatory requirements on data residency might mean classifying your data appropriately so you can determine which solutions are allowed to be hosted in country and which are not. You might also want to explore solutions that help you keep a copy of data in-country in cases where the regulator does not mind where the data is so long as they have local access to the copy.
While Africa contributes very little to cloud computing revenue globally, (see this article), many African nations are insisting on keeping their data within reach. This is a dilemma for many advancing organizations jumping on the digital transformation bandwagon. Thought leaders like Jim Marous of the Financial Brand and Michael Tang of Deloitte are all for Cloud Banking but there seems to be little focus on these data residency concerns from African nations.
The size of databases you are dealing with will impact the time required to migrate, the migration method as well as the costs you will eventually incur in the cloud in terms of storage. Cloud service providers offer several options for migrating data into RDS. This is the case particularly for companies like Microsoft and Oracle, which are veterans in the data management space.
However, thorough testing of the chosen approach is important to ensure migrations can be done within timelines and without data loss. This article from Stream Sets touches on essential considerations for migration data. Similar articles by Appinventiv and Spiceworks touch on cloud migration best practices in general.
Backup and Recovery
Most cloud service providers have native solutions for backup and recovery. In my view, these solutions were originally designed for the startup in mind – the need to restore data in case of data loss. This pattern shows up in the default retention policies that come with this backup mechanisms. For enterprises however, such as banks, there are other reasons for taking backups asides form the need to recover from failures. One such reason is regulatory reporting at a specific point in time such as the previous day, the previous month, the previous year or even a specific financial position. In such scenarios, the live database does not need to be overwritten and the retention must cater for more than seven days. This demands a different solution.
Another consideration in terms of back is how to manage backup sets in the long term. What happens if the organization decides to discontinue its relationship with the cloud service provider? Those back sets must be preserved in such a manner that they will be useable on the destination platform. The plan on how to migrate older backup sets must be clear.
Data management professionals often must make decisions about how much data to keep online. This is typically determined by the business and regulatory requirements. Keeping data online means the data is immediately available within the application. Keeping it offline means the data is in a backup set outside the system. Sometimes mechanisms for retaining the required data are part of the application design, e.g. the Active-Active solution for ACI’s Postilion. In other cases, application owners have to rely on data engineers to design and build mechanisms for data retention at the data layer.
Depending on its design, an application’s core tables would be responsible for a large chunk of the size thus of concern in terms of data retention. For example, a payment application might have a transaction table as the largest table. Decisions must be made as to how far back the transactions need to be viewed within such an application. Traditionally such retention can be achieved using table partitioning, stretched databases (SQL Server) and so forth. These technical solutions are used to implement decisions defined in retention policies.
Data Life Cycle
The moment decisions are made about what data needs to be retained within the system, the life cycle of data must be determined and how such a cycle is implemented. Data is captured within the application; some is retained online and some retained within back set. For every backup taken – daily, weekly, monthly, yearly – decisions must be made about how long such data needs to be kept. Depending on how long the data needs to be kept, decisions then have to be made about where such long term, cold data might be kept.
Cloud Service Providers typically provide options for what kind of storage is used to store hot, warm and cold data. In AWS, on might have day-old backup sets initially on S3 buckets, have them moved to Glacier for archiving with a configuration that automatically deletes them after the agreed offline retention period. In most Sub-Saharan African banks for example, data must be kept for up to 10 years. This can becoming quite expensive if not kept on archival storage. A cloud programme must include decisions about how the data life cycle will be managed using the desired CSP’s offerings.
When we hear the phrase “Data is the new gold” we must realize that this often referred to potential value. Actual value typical arises from making sense of the data through analytics. If your organization already has a platform for analytics, you have to decide whether you want to aggregate data from cloud sources to your existing platform or adoption a data platform in the cloud. You must consider your geographic location relative to your CSP’s data centres, the rate of growth of your data, the data capabilities you would want to develop and more.
Your on-prem data warehouse’s geographic location impacts how efficient your ETL or ELT jobs will be. If you are in sub-Saharan Africa, you might be better off using cloud-based data platform for analytics the moment you choose to move databases to the cloud. Latency between your databases and an on-premises data warehouse will have impact on your ETL or ELT jobs. Using cloud-based analytics also affords you the opportunity to quickly adjust to growing data without tedious storage procurement processes.
Some organizations still need to give selected staffers direct access to production databases for troubleshooting, support and ad-hoc analytics. When such databases are in-premises the risk of allowing communication to the database is lower. Once the database is in the cloud, one might be tempted to open access to the public in the name of making it easy for staff to do their work from anywhere. Quite dangerous.
It is preferred to use firewall rules to restrict access to only IP Addresses from your corporate network. AWS makes this easy by providing virtual firewalls called Security Groups. These days, CSPs also provide browser-based client tools for people who need to work with databases directly. This may address data access security concerns if the tools are launched from within the cloud service provider's console.
Security is a huge subject these days and many innovative organizations are coming up with solutions that are better aligned with cloud computing. One of such solutions is Secure Access Service Edge (SASE), which is the “convergence of wide area networking, or WAN, and network security services like CASB, FWaaS and Zero Trust, into a single, cloud-delivered service model” according to Palo-Alto Networks.
In moving databases to the cloud, we must think about encryption at rest (Transparent Data Encryption (TDE)?) and in transit (Transport Layer Security (TLS)?). We must think about ransomware, password policies, packages and more. MTI College gives an overview of cloud security in this article. Cato Networks also outlines the key elements of SASE in this article.
Data privacy is receiving more and more focus as part of the data protection triad – Traditional Data Protection, Data Security and Data Privacy. Data Privacy is focused on personal data and confidential data. For the most part the concern from a design decision perspective is to align with relevant regulatory requirements on personal and confidential data. The location of you preferred CSP’s data centres might impact your decision in terms of aligning with data privacy laws. Baker McKenzie's summary of data privacy laws in Africa is a good reference.
Example - Amazon RDS for SQL Server
Cloud Service Providers make it very easy for us to deploy assets in the cloud. So easy in fact that often we do not think carefully about the consequences of our decisions. So far we have examined ten key considerations that must be taken into account when adopting a CSP as the preferred location for your databases. In this section, we shall create a Microsoft SQL Server RDS instance and mention along the way our rationale for making the choices we made.
Ours is a very simple use case for deploying a database on Amazon RDS for the purpose of demonstrating SQL Server concepts - specifically T-SQL. In essence, we want a low-cost, light, efficient database that has all the features necessary for writing T-SQL and nothing more. We do not need to manage the underlying infrastructure for this database and we are not storing any sensitive data. There is also no need to preserve data from this data in backup sets. We shall walk through the simple steps for launching this RDS instance and the options we choose based on this background.
We start by establishing a connection to Amazon Web Services using our root account or IAM user. To open the RDS console, we use the search bar shown in Figure. If you have done this before, you will find this entry in the recently visited section. Click RDS to open the RDS console. First we establish a few reasons we are choosing RDS (See Figure 1):
- We want to reduce the overhead associated with managing the database environment
- Our use case is simple, we do not need any sophisticated instance-level configurations