SQLServerCentral Article

Data Mining Part 38: Text Mining - The term lookup

,

Introduction

In this example, we will show how to use the Lookup Transformation Tool. This tool is used to detect the frequency of a word from a specified text. This task is similar to the term extraction task explained in chapter 35. This is one of the Text Mining Tools included in SQL Server Integration Services (SSIS).

Requirements

For this level, we need:

  1. SQL Server Installed
  2. Integration Services installed
  3. SSDT (SQL Server Data Tools)

Getting Started

1. For this example we need the sqlservercentral comments table that you can create using the resource files at the end of this article.

2. The sqlserverCentral_comments contains comments of some of the sqlserver central articles:

3. We will also require the dbo.terms table. This table contains the words that we want to measure the frequency that they are used.

4. In this example, we have 5 terms:

You can create the table using the resource files at the end of this article.

5. In a SSIS project in SSDT, Drag and drop the Data Flow Task.

6. Double click the Data Flow Task and Add an OLE DB Source, the Term Lookup and the Excel destination.

7. In the OLE DB Source, connect to the SQL Server that stores the table sqlservercentral_comments.

8. In the Term Lookup Transformation in the Reference Table, select the connection where the terms table is stored.

9. In the Term Lookup tab, match the columns that contain the terms and the text to search. In this Example, text and terms.

10. The advanced tab allows you to check the option to handle a case-sensitive term lookup or not. We also have the Configure Error Output button. 

11. By default, there is a warning about the errors. If there are some errors during the Lookup, the data with errors will be lost. The Configure Error Output button is used to handle this errors.

12. In the Configure Error Output window, you can Ignore the failure or fail the task on error.

13. In the Excel Destination Editor, press the New button to create a new Excel Connection Manager.

14. Specify a path for the file. The SSIS will create a new file. In this example, we will create a Microsoft Excel 2007 file.

15. In the Name of the Excel sheet, press the new button.

16. You will receive a warning with the message that there is no sufficient information about mapping SSIS types to data types of the selected .NET data provider. As a result, you may need to modify the default column types of the SQL statement on the next screen. Press OK.

17. In the create table, press OK.

18. You will receive the following message:

19. Select the Excel Sheet created from the drop-down list box.

20. Click on the Mappings page to map the columns.

21. Run the package.

22. Open the Excel file and check the results.

23. As you can see it shows the frequency of the terms used per column. For example, the text in the row 7 uses the word data 5 times and in row 9, 4 times. The Term Lookup helps to measure the frequency that defined terms are used in a specified text. 

Conclusion

In this example, we found the frequency of specified terms used per row using the term lookup task in SSIS.

This tool can help with the limitation of the term extraction that only supports English. In this case, this tool does not have this limitation.

References

Resources

Rate

4.75 (4)

You rated this post out of 5. Change rating

Share

Share

Rate

4.75 (4)

You rated this post out of 5. Change rating