From DBA to DWA
A while back on SQLServerCentral.com there was an article titled "From DBA to DBAA", by Jeffrey Yao. So I thought, what about "DWA"? That inspired me to write this article, "From DBA to DWA", so thanks Jeffrey. DWA stands for Data Warehouse Administrator. Some people would use that acronym for referring to 'Data Warehouse Architect', but that's for another article. Today let's just focus on Data Warehouse Administrator. As this forum is SQLServerCentral, we will focus on DWA on SQL Server platform.
A data warehouse administrator is responsible primarily for maintaining the integrity and availability of the enterprise data warehouse, including the data quality. He or she needs to keep the data warehouse up and running. Perhaps it is a high availability SQL Server 2005 data warehouse containing 5 Terabyte of corporate information, used by a few hundred users in 50 branches nationwide for business intelligence (BI) and customer relationship management (CRM). Or perhaps it is a single server 300 GB data warehouse used by 15 users in the company head quarter to analyze sales, customer and products. Whatever the data warehouse is, the main job of a data warehouse administrator is to maintain it.
Every data warehouse system has ETL processes that pumps data into the data warehouse at certain interval. Perhaps every day, every week, or every few hours. One of the data warehouse administrator main tasks is to monitor these ETL processes to make sure it works. It is very important to monitor the ETL processes, because they feed the data warehouse with data. If the ETL processes do not run properly, the data in the data warehouse would become obsolete. If the ETL processes ran half way then stopped, the data in the data warehouse could become incomplete. If they ran incorrectly, the data could become incorrect. And if the data is incorrect or incomplete then all decisions that are made based on that data would be affected. This is why it is important to ensure that the ETL processes work correctly all the time.
A data warehouse administrator ideally reports to the data warehouse manager, but sometimes they report to the data warehouse architect. The key tasks of a data warehouse administrator are: (assuming it's on SQL Server platform)
- Monitor daily and weekly ETL/DTS/SSIS tasks
- Manage data warehouse databases and maintain all database servers
- Manage Analysis Services cubes and servers
- Administer Reporting Services reports and servers (possibly a web farm)
- Manage mining model and predictive analytics
- Manage data warehouse security
- Report data warehouse load and activities
- Bulk upload new data into the warehouse
- Install patches and manage upgrade
- Maintain data warehouse portal
- Backup and test restore all data warehouse objects
- Liaise with the development team for code deployments
- Liaise with the business team for data requests
- Organize training sessions for end users
- Helping users with their queries
The core technical competencies required for a data warehouse administrator are:
- Experience maintaining SQL Server databases
- Knowledge of reporting and analysis services
- Good understanding of data warehousing principles
- Familiar with dimensional modeling
- Understanding of how to manage SSIS/DTS jobs/packages
- Preferably MCDBA certified or MCITP on BI
A data warehouse administrator needs to know how to do performance tuning on a data warehouse. They need to understand the differences between tuning a fact table in a dimensional data store to tuning a transaction table in OLTP system. They need to know why simple recovery is more suitable for stage and dimensional data store but full recovery is better for the operational data store.
In some companies, the DWA is also responsible for maintaining reports and SSIS tasks. But in most cases these tasks are taken care of by another person: the data warehouse developer. If the company is using a Business Intelligence (BI) or Enterprise Performance Management (EPM) tool such as Business Objects, SAS, Cognos, Hyperion, MicroStrategy or ProClarity, it is possible that the DWA would be looking after this tool too. Some companies implement SharePoint for reporting services portal, so Sharepoint skills may be relevant for a SQL Server DWA.
The term 'Data Warehouse Administrator' is more popular in Teradata, DB/2 and Oracle shops rather than SQL Server shops. But since the launch of SQL Server 2005, data warehousing on Microsoft platform became more and more popular. With SQL Server 2008 coming at the end of February, we can expect to hear more about SQL Server DWA roles in the future.
Now that we know what a DWA is, how would someone transition himself or herself from a SQL Server DBA to a SQL Server DWA? Well, a SQL Server DBA is the perfect candidate for a SQL Server DWA. A SQL Server DBA is the closest match to a SQL Server DWA, closer than any other jobs in IT. A SQL Server DBA can administer SQL Server databases. He or she can maintain user security, configure SQL Server, do database backup, manage disk spaces, install SQL Server patches/updates, etc. All these skills provide a very strong foundation to do the SQL Server DWA tasks.
The usual skill gap is Analysis Services cubes. Administering SSIS jobs and SSRS reports are not difficult to learn (I'm not talking about developing them, only administering). But administering Analysis Services cubes is a bit different. It takes some time to be familiar with the concepts of multidimensional databases. If the organization also uses data mining, then maintaining mining models will also take some time to learn.
Beside Analysis Services cubes and data mining, the other skill gap is the knowledge of data warehousing concept. A SQL Server DWA needs to be familiar with data warehousing and dimensional modelling concepts such as fact table, dimension table, slowly changing dimension, surrogate keys, aggregates/summary tables, dimensional hierarchy, late arriving facts and conformed dimensions. They need to understand the role of effective date columns in SCD type 2. A DWA need to be able to describe the differences between loading data into a transactional fact table, periodic snapshot fact table and accumulative snapshot fact table.
Data warehousing concept is a fundamental knowledge that a DWA must have. A DWA does not need to have the experience to design a data warehouse from scratch, but he or she needs to have basic understanding about (for example) how a data warehouse database is different from an OLTP database. This understanding is necessary because it impacts the tuning, table partitioning operation (extending partitions and archiving old ones), indexing, queries and aggregates. Ideally, a DWA will also need a good understanding about ETL, i.e. about how the data is extracted from the source systems and loaded into the data warehouse. This is necessary because he or she will be responsible to monitor the ETL processes and it could possibly impact the backup strategy. A DWA needs to know whether the data integrity could be affected when he or she re-run a failed ETL process.
A DWA will also need a good understanding about data quality concept. It is absolutely critical that the data in the data warehouse must be accurate and complete. If we can't trust the data in the data warehouse then what is the point of having a data warehouse? The DWA needs to understand the mechanism of data quality, i.e. how the data firewall detects the bad data before it is loaded into the data warehouse, how the bad data is reported and how it is corrected. Having a thorough understanding of the data quality processes will help him or her maintaining the quality of the data in the data warehouse.
So that's the tasks and responsibilities of a data warehouse administrator and the skills/knowledge required to perform the tasks. For those of you who wish to embark on a data warehousing journey, I wish you good luck and all the best. I think at the moment the data warehousing work on SQL Server platform is growing and that means good opportunity for us. And with SQL Server 2008 improvements on data warehousing (such as star join query optimization, using data compression to improve fact tables query performance, using change data capture for ETL and using Merge SQL command to do upsert operation), the future of Microsoft SQL Server data warehousing would be even more attractive.
© July 2007