Technical Article

Data Mining Part 31: DMX functions

,

Introduction

In the chapter 12, we learned about DMX. We learned some functions like the Predict, PredictHistogram, PredidictSupport and other functions.

This lesson will show more DMX functions.

Requirement

This chapter assumes that you are already familiar with the DMX language explained in the chapter 12.

Getting Started

1. In the SQL Server Management Studio  (SSMS) connect to the Analysis Services and create a new DMX query.

2. Let's start with the IsInNode function, shows the cases related to a node specified. 

Select *
 from [Customer Clusters].Cases
 WHERE IsInNode('008')

As you can see, the query requires the model name and the node name. The query shows the cases related to the node 008.

3. You can find the node and model information using this query:

SELECT MODEL_NAME,  NODE_NAME
FROM $System.DMSCHEMA_MINING_MODEL_CONTENT

4. Another function is the Lag(). This function is used in time series algorithms only. The following example shows the amount earned, the quantity sold, and the Time Index of the cases in the last 24 months of the trained data.

SELECT Amount, Quantity, [Time Index] FROM [Forecasting].CASES
WHERE Lag() < 24

5. The function RangeMax returns the maximum value of a bucket of a discretized column. The following example shows the maximum value of amounts earned for the forecasting algorithm.

SELECT DISTINCT
    RangeMax([Amount]) AS [Maximum]
FROM [Forecasting]

6. If we try the same query for the TM Cluster algorithm, we will have 10 values for the ages of the customers.

SELECT DISTINCT
    RangeMin([Age]) AS [Maximum]
FROM [TM Clustering]

7. This is because the ages are discretized in 10 buckets.

8. In the same way, you can obtain the Minimum discretized value of a bucket.

SELECT DISTINCT
    RangeMin([Amount]) AS [Maximum]
 FROM [Forecasting]

9. Or the Midle value of a bucket in a discretized column.

SELECT DISTINCT
    RangeMid([Amount]) AS [Maximum]
 FROM [Forecasting]

10. Let's talk a little more about decition trees nodes. In decition trees, you have a parent child with different children.

11. The IsDescendant shows the information about the descendats of the node 00000000100002010101

SELECT MODEL_NAME,NODE_NAME,NODE_DESCRIPTION
 FROM [TM Decision Tree].CONTENT
 WHERE IsDescendant('0000000010002010101')

12. The following function is the Cluster. The cluster algorightm does not have parents and children as the desicion trees.  Each node (cluster) is independent. The cluster function obtains the cluster name that will probably contain the case specified. The following example shows the cluster name of the user male, married with 1 car and 2 children at home.

SELECT
  Cluster()
 FROM [TM Clustering]
  NATURAL PREDICTION JOIN
   (SELECT 28 AS [Age],
     'High School' AS [Education],
     'M' AS [Gender],
     'M' AS [Marital Status],
     1 AS [Number Cars Owned],
     2 AS [Number Children At Home]) AS t

13. The cluster probability shows the probability for a specific node to accomplish a specific case. For example, in this query we are checking the probability of the Cluster 8 to accomplish the following charactecteristics of a user that is male, married with 1 car and 2 children at home.

SELECT
  ClusterProbability('Cluster 8') as [probability]
FROM [TM Clustering]
  NATURAL PREDICTION JOIN
   (SELECT 28 AS [Age],
      'High School' AS [Education],
      'M' AS [Gender],
      'M' AS [Marital Status],
      1 AS [Number Cars Owned],
      2 AS [Number Children At Home]) AS t

14. If you to want to check the Cluster names of a Mining Model, you can use the following query:

SELECT NODE_CAPTION FROM [TM Clustering].CONTENT
15. The ClusterDistance function is similar to other functions, but it returns the distance between the input case and the cluster with more probability.
SELECT
  ClusterDistance() as distance
From [TM Clustering]
  NATURAL PREDICTION JOIN
   (SELECT 28 AS [Age],
      '2-5 Miles' AS [Commute Distance],
      'Graduate Degree' AS [Education],
      0 AS [Number Cars Owned],
      0 AS [Number Children At Home]) AS t

16. As you know (or should know), when you create your data mining structure, part is used for training and part is used for testing the model (see the step 11 of the Introduction chapter for more information). You can query the test data and training data.

The following example shows the age, bike buyer, education and occupation information of the training cases of the customers whose education is bachellor.

SELECT Age, [Bike Buyer],Education, Occupation
 FROM [TM Decision Tree].CASES
 WHERE IsTrainingCase() and Education='Bachelors'

17. The next example shows the same information of the step 16, but it shows the Data used to test the Data mining structures.

SELECT Age, [Bike Buyer],Education, Occupation
 FROM [TM Decision Tree].CASES
 WHERE IsTestCase() and Education='Bachelors'

18. The other function is the StructureColumn it shows the value of the structure column.

SELECT StructureColumn('Gender') as Gender
 FROM [TM Decision Tree].CASES
 WHERE IsTrainingCase()  and [Customer key]=11081

Conclusion

As you can see, the DMX functions are not difficult to learn. There are some tricks to use them and they have many limitations compared with the T-SQL which is extremely flexible compared with the DMX.

References

https://msdn.microsoft.com/en-us/library/ms131996.aspx

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating