Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Denali – Day 23: Data Quality Services (DQS)

Denali – Day 23: Data Quality Services(DQS)

Data intelligence vs. human intelligence: Sql server RDBMS is a database system which has data in it, most of the time, we enter the data but as a user they enter the data assuming things are good, like “USA, US or America are same for user as a knowledgeable person but for the system it is not, so using DQS we can integrate the human intelligence to make our data become more productive and can cleanse or correct the data.

 

Not installed by default.

Custom Install/activate using sql server Engine setup, to check the option.

Once setup done, need to activate or install the DQS using GUI – Data Quality Installer or DQSInstaller.exe. to complete the installation.

Need to provide strong password for master key.

It has two types

  • Server
  • Client

 

Useful for BI.

 

SSIS for data cleansing or TOOL: DQSInstaller.exe

 

Metadata DATABASE: DQS_Main, DQS_Projects, DQS_Staging_data

 

Security:

Dqs_administrator

Dsq_kb_editor

Dsq_kb_operator

 

To DQS Tool have three major components in it.

  1. Knowledge Base Management
  2. Data Quality Projects:
  3. Administration

 

KB is to let DQS build the knowledge and let it declare what is correct, means educate or knowledge the system, once system is having data, we can build the project to produce the required output like correct the data or make the report or produce the duplicate data, and to administrate the same using Administration.

 

  • Knowledge Base Management

     

    Data Quality Knowledge Base (DQKB) Build KB to be used into projects.

    KB – steward

    Knowledge Discovery: DQKB… discovery

    It is the first step to analyze your system to generate the KB.

    Select source and table

    Table columns to kb domain

     

    Create a new domain for each column to validate

    >>Start>>

     

    Reviewed — have three options correct, error, invalid

     

    Select KB

     

    Check Domain Management

     

    Create domain rules

    Domain Management/ composite domain:

    Once you analyze the system you can create a Domain Management and build the Business rules.

    Right click KB and select a Kb, select domain management…

    Create a composite domain.

    Build a Business Rule

     

    We can even create a composite domain

    Composite domain: eg. Address.

    *use this to clean the data, using DQS Project Cleansing.

     

    Matching Policy:

     

    Matching policy is a KB to check for duplicate on your system.

    Overcome errors

     

    Open the Created KB.

    And select Matching policy.

    How Do I: Create a Matching Policy with SQL Server Denali DQS?

    Check for duplication or matching per requirement.

     

  • Data Quality Projects:

     

    Cleansing:

     

    Once you have KB ready, you can use that KB in a project, Cleansing is a project to validate the KB and produce the required result. Create a project and cleanse with the specific Rules, start the cleansing will provide you the results. Which can be corrected (approved) and export the output result into xml or sql server.

     

In xls sheet. Ongoing. New DQP , project

Create cleansing.

Excel file

Xls columns to kb domain.

Start …

 

Interactive cleansing.

Results…

Job Title: Suggested, new, invalid, corrected, correct -à Approve/Reject

 

We can correct/approval the correction

Lastly: exporting the cleansing data.

Destination … xls or sql data (cannot to existing table)

 

“It uses KB gathering in a data quality KB. Cleansing results can be exported to xls or sql table”

 

*Composite Domain to cleanse data

 

Matching: CTP3. :

Matching is used to check for duplicates /matching records and produced and exports it.

 

Reflects business requirement to find duplicate records

Better KB more matching results would be.

 

  • Administration

     

    *You can install DQS tool independently with SSIS on separate server without Engine.

     

    DQS is integrated with both SQL Server Integration Services and Master Data Services.

     

    Summary:

    Eg. If we have an .xls file which has some data, which is not clear and we want it to be produced meaningful output, so we will create a KB with that xls and linked the columns with our domain. Create rules,

    Create a new cleansing project and use that xls file and use the kb, and associate it, and start the cleansing; it will provide the result which matches with the rules.

    We can approve the resulted data to corrected, and then after completion of cleansing, we can export back the resulted information to xls or sql server new table.

     

    If we want to configure “Azure” can be done in an administration part.

    Configuration:

    Azure (Data Market Reference Data Service (RDS) Provider)

     

    *I am learning it so please suggest if I am wrong here.

     

Ref:

 

http://msdn.microsoft.com/en-us/library/ff877925(SQL.110).aspx

http://blogs.msdn.com/b/dqs/

http://www.mssqltips.com/tipprint.asp?tip=2575

http://www.youtube.com/watch?v=jfDVG8Nf8No


Comments

Leave a comment on the original post [thakurvinay.wordpress.com, opens in a new window]

Loading comments...