This level of the Stairway to Biml examines how to refactor your Biml into an easier to maintain format.
Business Intelligence Markup Language (Biml) is a domain-specific XML dialect used to specify end-to-end business intelligence (BI) and data warehouse (DW) solutions. Currently Biml supports the representation of relational models, data transformation packages, multi-dimensional models (OLAP), permissions, databases, and flat files.
The Microsoft SQL Server Business Intelligence stack offers an excellent platform for the development of BI / DW solutions. SQL Server is a robust, reliable and enterprise-grade relational database; and SSIS and SSAS have robust processing engines that have proven themselves in some of the toughest situations. In addition to the robust processing engines, Microsoft has targeted their tools to be comfortable to the IT generalist working in heterogeneous environments. This is accomplished by making the tools highly graphical so that they will appeal to a broad range of needs and skill sets.
The Microsoft BI Stack can be adopted in whole or in part. For instance, SSAS can work with an Oracle database, and SSIS can be installed and used without a SQL Server database. The strengths of the Microsoft BI Stack were achieved with the following disadvantages:
- The Business Intelligence Development Studio (BIDS) and SQL Server Data Tools (SSDT) emphasis on graphical tools has implications for file format. The dtsx file format in SSIS 2005 and 2008 was not designed to be readable or writable by humans. While SSIS 2012’s file format is human-readable it is still not designed to be human-writable. The dtsx file tracks a large amount of non-semantic data such as design surface coordinates, GUIDs, and other designer-specific markup. Change tracking and merging is not practical. It is easier in SSIS 2012, but small changes in a package can still result in many changes to the file that have little or nothing to do with semantic updates.
- The Business Intelligence Development Studio (BIDS) and SQL Server Data Tools (SSDT) emphasis on graphical tools has implications for usability. Details require context switching between Control Flow and Data Flow. The tools have limited search, find/replace, and compare tools. Non-semantic changes result in file changes. For example, changing the position of a task on the Control Flow results in a change just like rewriting the contents of a Script Task. Changes to code are mixed with changes to non-code.
- Business Intelligence Development Studio (BIDS) and SQL Server Data Tools (SSDT) architectural limitations and bugs. An incrementally edited graphical design surface can result in lineage tracking that is easily corrupted. Package edits can only be done one at a time and edits cannot be applied to a set of packages. This makes refactoring many packages costly, time-consuming and error prone. BIDS and SSDT limit the use of implementing many aspects of agile development methodologies such as continuous integration, code reviews, incremental development, and code refactoring.
- The emphasis on piecemeal adoption has implications for development model. There are different projects and user interfaces for different levels of the Business Intelligence (BI) stack. Shared aspects of the solution must be re-specified or imported at every level in the BI stack. Changes must be synchronized across all levels of the BI stack.
- The focus on the generalist has implications for abstraction model. Inherently, there are very limited capabilities for abstraction and reuse. There are no inherent capabilities for formalizing patterns and practices.
Biml and BimlScript address these disadvantages with the following features and functionality:
- Biml is a simple, declarative language used to specify BI assets and metadata. There is no requirement for an additional runtime. Biml generates compiled code, and all the compiled code is usable in the standard Microsoft tool set. Biml is easy to read, even by people new to SQL Server, SSIS and SSAS. Biml can be used with a variety of editors and development tools. With Biml, there is no need to declare default attributes and values. One can use Biml to copy and paste complex ETL logic from one project to another.
- Biml is designed to work with Agile development and management practices. Biml works great with source control, merging and file compare tools. Developers can easily share Biml code with development teams and the Biml community. Biml makes refactoring complex, ETL logic fast, easy and fun! Biml produces clear and easy to read Diff reports that makes code review easy. With Biml, it is easier to separate duties between architects, junior and senior developers, and system subject-matter experts.
- BimlScript includes T4-like delimiters to add C# or VB.Net code blocks in order to automate Biml code against metadata sources. The Biml engine can use any metadata source accessible by .Net. Biml provides an in-memory model as a metadata source to leverage even more functionality from Biml. Biml can be used to codify patterns into BimlScripts that respond to underlying metadata changes. Biml can be used to refactor one to hundreds of packages in as little as one BimlScript file.
What Can You Do With Biml?
Suppose your manager tasks you with building a data warehouse? The source data resides in 100 tables on the system of record, and each table is designed differently. Based on an initial analysis, you realize an incremental load pattern will best meet the requirements. Manually coding 100 SSIS packages using the Incremental Load design pattern will take time. How much time? I usually estimate two hours per SSIS package (which includes development and unit testing). If that holds for your example, that’s 200 hours of SSIS development.
Biml allows you to create a template that will generate an Incremental Load SSIS package. In this Stairway Series, we will do just that in Building an Incremental Load Package – Level 3. Using Biml to generate a template for a single SSIS package is cool but it doesn’t really solve your problem. The fourth installment in this series helps. Using Biml as an SSIS Design Patterns Engine – Level 4 takes extends the functionality of the Incremental Load Biml file built in Level 3. The result is a Biml file that will build an incremental load SSIS package for each table in a source database. In your use case, that would mean the Biml file would construct 100 SSIS packages in a matter of minutes.
But there’s more. Because the SSIS packages are generated from the same template, the quality of the code increases while the development time decreases. Increased quality further translates into reduced testing requirements. In his now-classic 1986 article, No Silver Bullet—Essence and Accidents of Software Engineering, Frederick Brooks postulates “there is no single development, in either technology or management technique, which by itself promises even one order of magnitude [tenfold] improvement within a decade in productivity, in reliability, in simplicity.” Mr. Brooks may be right, but Biml comes pretty close to hitting (and exceeding) that mark when it comes to certain data integration use cases for SSIS.
In this introduction we have listed some limitations faced by data integration developers and architects using only Microsoft development tools. We also provided a list of ways Biml and BimlScript address these limitations.
In this series, the authors intend to demonstrate several features of – and use cases for – Biml in the data integration enterprise. We believe Biml is game-changing technology that can literally save data integration developers hundreds (or even thousands) of hours. We hope you will join us on this journey!