Technical Article

Data Mining Part 26: Introduction to Cubes

,

Introduction

In earlier chapters, we worked with views, which were the source of data for our Data Mining models. It is also possible to extract information from SQL Server Analysis Services Cubes. In this chapter, we will create a Cube from almost zero and in the next chapter we will create a Mining Model from the cube created in this chapter.

Requirements

1. We are using the Adventure Works DW Database 2014 which must be installed as a database.

2. The SQL Server Data Tools (SSDT)  for BI installed. 

A Brief Introduction to Cubes

In this chapter, I will give a brief explanation about the Multidimensional databases and cubes in Microsoft SQL Server. SSAS is a technology used to create multidimensional databases. These databases are special databases to generate strategic reports quickly. The Multidimensional databases are faster than the traditional SQL Server Transactional Database for reports. Usually, high-level reports.

These multidimensional databases require mainly 5 components:

  • A datasource, to create a connection from tables from the SQL Server Engine or Oracle.
  • A dataview, which includes the tables or views to be included in our multidimensional Database.
  • The dimensions are groups of attributes based of columns grouped in different categories like customers, dates, products, etc.
  • The Fact table contains the data that we want to measure (called the measures), this table also contain relationships to the dimension tables.
  • The cubes are a combination of the Fact table(s) joined with the dimensions.

The next example will create all these components based on the tables of the AdventureWorks database mentioned in the requirements.

Getting Started

This tutorial is for newbies with no experience at all in SSAS.

1. Open SSDT.

2. Go to File>New Project and select Analysis Services Project.

3. In the Solution Explorer, go to Data Sources, right click, and select New Data Source.

4. In the Wizard, press next.

5. In the Select how to define the connection dialog, press the New Button.

6. Specify the Server name (in this example, the local machine) and the database (in this example the AdventureworksDW Database) and the Authentication (in this example, the windows authentication).

7. Once you have created the connection, press Finish.

8. Double click on the Data Source create button and go to the Impersonation Information tab. Write a username and password of an account that has at least read permissions to the Source Database objects (the tables of the AdventureworksDW database). This step is very important, and it is the main problem for newbies when they process the database.

9. In the Data Source Views, create a new data source view.

10. In the Data Source View Wizard, press next.

11. Select the Data Source created on steps 1-7.

12. In this step, we will add the Fact and the Dimension tables. Usually, you can fact one table with several Dimensions. In this example, to simplify it, we will have one Fact table with one single Dimension. The Fact table will be the FactInternetSales and the Dimension, the DimCustomer.

13. In the Completing the Wizard, press Finish.

14. If you double click on the Data View, you will be able to see the tables and the relationship automatically created between the Fact and the Dimension (if the column in both tables have the same names, the relationship is automatically created). 

15. With the Explore Data option, you can view the information of the table.

16. As you can see, the Fact table is a table with columns used to create relationships with the dimensions (all the columns that end on Key like CurrencyKey, SalesTerritoryKey). The other columns are the columns used to measure sales, discounts, orders and other important information that we want to obtain from our company.

17. Close the data and now check the columns of the DimCustomer dimension. The dimension contains information that we want to use combined with the fact information, like the SalesAmount per Customer Gender or the SalesAmount per Marital Status. It also has the CustomerKey to join data with the Fact Table.

18. Return to the Solution Explorer and right click on the cube and select New Cube.

19. In the Welcome Window, press next.

20. We will create a cube using existing tables.

21. Select the tables added from the Data View (the DimCustomer and FactInternetSales)

22. The measures are the information that we want to measure like the Revision Number, Order Quantity, Unit Price, Extended Amount, Unit Price Discount Pct, Discount Amount, Product Standard Cost,  Total Product Cost, Sales Amount, Tax Amt. 

Make sure that the other attributes are unselected.

23. Unselect the Dim Customer attributes (if selected).

24. Select the Dim Customer dimension for the cube and press next.

25. In the New dimensions, unselect the FactInternetSales. We do not need new dimensions here.

26. In the Completing the Wizard, press Finish.

27. We are almost ready with our model, but we need to modify our Dim Customer Dimension. Double click on it.

28. By default in the Attributes, just the Customer Key is added to the Attributes. From the Source View, drag and drop the following columns to the Attributes of the Dim Customer dimension:

  1. Commute Distance
  2. English Education
  3. English Occupation
  4. First Name
  5. Gender
  6. House Owner Flag
  7. Last Name
  8. Marital Status
  9. Number of Cars Owned
  10. Number Childer At Home
  11. Total Children
  12. Yearly Income

29. On the solution Explorer right click on the Adventure Works cube and select Process. The process will load the data and process the structure of the Multidimensional cube.

30. If the data is out of date (it will be if it is your first time processing, press yes to the question to build and deploy the project)

31. In the Process Cube Window, press the Run button.

32. On the Process Progress, a process success message should be displayed. Press the close button.

33. Double click on the Adventure works cube.

34. Press the Reconnect icon.

35. We are ready to create our first report Expand the Fact Internet Sales and drag and drop the Sales Amount measure to the Report Area. You will be able to see the total Sales Amount of the Company of 29358677.2 USD.

36. If you want to see the total sales per male and female customer, Drag and drop the Gender to the report pane. You will now be able to see the Total sales to Female and Male customers.

37. Now, I want to see not only the total sales per gender, but also per marital status. Drag and drop, the Marital Status attribute to the report pane. You now can see the Total sales per Gender and Marital Status. Now we are ready to use this cube for Data Mining !

Conclusion

In this tutorial, we learned how to create a SSAS cube ready for data Mining. The next step will be to create a Mining Model from this cube. The next chapter will show how to create a Mining Model from this cube.

The SSAS requires a Data Source (connections), Data View (tables with data), which will generate the Fact and Dimensions which will form a cube with the information.

References

https://msdn.microsoft.com/en-us/library/ms174766.aspx

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating