|
|
|
SSC-Addicted
      
Group: General Forum Members
Last Login: Today @ 10:34 AM
Points: 494,
Visits: 2,158
|
|
I'm developing a process for performing data cleaning / dedeuplication. To make the process as flexible as possible I am using an XML configuration file to define the information for the source table to be cleaned.
Here is a sample of the XML file , this part maps the source data columns to the standard columns used by my code.
<dataSource>
<tables> <table name="SourceTable" uniqueRef="Master_ID" /> </tables>
<fieldMappings> <fieldMapping DefaultField="FullName" databaseField="" /> <fieldMapping DefaultField="Prefix" databaseField="" /> <fieldMapping DefaultField="LastName" databaseField="Surname" /> <fieldMapping DefaultField="FirstNames" databaseField="Forename" /> <fieldMapping DefaultField="Initials" databaseField="" /> <fieldMapping DefaultField="Qualification" databaseField="Title" /> <fieldMapping DefaultField="Suffix" databaseField="" /> <fieldMapping DefaultField="Organization" databaseField="OrganisationName" /> <fieldMapping DefaultField="Department" databaseField="" /> <fieldMapping DefaultField="JobTitle" databaseField="" /> <fieldMapping DefaultField="Address1" databaseField="Address_Line1" /> <fieldMapping DefaultField="Address2" databaseField="Address_Line2" />
</fieldMappings>
</dataSource>
So this is saying my source table contains the following columns:
1) Master_ID 2) Forename 3) Surname 4) CompanyName
At this stage, I want to apply various cleaning routines and insert the output into the standard output table (this contains 20-30 columns which covers all the types of data that the code will potentially have to deal with.
This is the part I need some help with as I'm not sure how to dynamically create the insert statement of the output table (see below):
Can anyone suggest how solve this kind of problem?
Hope this makes sense and thanks of your help in advance!
----------------------------------- http://www.SQL4n00bs.com
|
|
|
|
|
SSCrazy
      
Group: General Forum Members
Last Login: Today @ 10:39 AM
Points: 2,556,
Visits: 4,398
|
|
Cannot see sample of XML or "output" in your post...
_____________________________________________ "The only true wisdom is in knowing you know nothing" "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!" (So many miracle inventions provided by MS to us...)
How to post your question to get the best and quick help
|
|
|
|
|
SSC-Addicted
      
Group: General Forum Members
Last Login: Today @ 10:34 AM
Points: 494,
Visits: 2,158
|
|
|
|
|
|
SSChampion
        
Group: General Forum Members
Last Login: Today @ 6:41 PM
Points: 11,648,
Visits: 27,760
|
|
besides identifying the columns in the table, you need another attribute to identify what makes a collection of columns unique, especially if it's not the PK of the table and there's no unique constraint.; for example,
<fieldMapping DefaultField="Lastname" databaseField="Surname" IsPartOfUniqueCriteria="true">
then your app can use Linq or whatever to group the data by the IsPartOfUniqueCriteria=true columns and look for duplicates;
Lowell
--There is no spoon, and there's no default ORDER BY in sql server either. Actually, Common Sense is so rare, it should be considered a Superpower. --my son
|
|
|
|
|
SSC-Addicted
      
Group: General Forum Members
Last Login: Today @ 10:34 AM
Points: 494,
Visits: 2,158
|
|
Yes, I have this covered in another section of the XML file where the match keys are defined.
Okay can you guys see both pictures now? I wonder if this is why no has replaied until now!
Anyway, I've started buidling strings to generatet the select part of the final insert and then I will also generate strings for the CROSS apply part where I generate the required values to populate the staging table.
Something like this:
DECLARE @mkNameKeySELECTString NVARCHAR(100) DECLARE @mkName1SELECTString NVARCHAR(100) DECLARE @mkName2SELECTString NVARCHAR(100) DECLARE @mkName3SELECTString NVARCHAR(100) -- If we already have First and Last names then there is no need to split the names IF EXISTS (SELECT 1 FROM dbo.FieldMappings WHERE StagingColumn = 'FirstNames') AND EXISTS (SELECT 1 FROM dbo.FieldMappings WHERE StagingColumn = 'LastName') BEGIN SET @mkNameKeySELECTString = 'dbo.NYSIISPhoneticEncoder(dbo.GetLastWord(' + dbo.GetSourceColumnName('LastName') + ')) + LEFT(' + dbo.GetSourceColumnName('FirstNames') + ', 1)' SELECT @mkNameKeySELECTString as mkName1 SET @mkName1SELECTString = 'dbo.NYSIISPhoneticEncoder(dbo.GetFirstWord(' + dbo.GetSourceColumnName('Firstnames') + '))' SELECT @mkName1SELECTString as mkName2 SET @mkName2SELECTString = 'dbo.NYSIISPhoneticEncoder(dbo.GetSecondWord(' + dbo.GetSourceColumnName('FirstNames') + '))' SELECT @mkName2SELECTString as mkName3 END
----------------------------------- http://www.SQL4n00bs.com
|
|
|
|
|
SSCrazy
      
Group: General Forum Members
Last Login: Today @ 10:39 AM
Points: 2,556,
Visits: 4,398
|
|
I cannot see any picture...
_____________________________________________ "The only true wisdom is in knowing you know nothing" "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!" (So many miracle inventions provided by MS to us...)
How to post your question to get the best and quick help
|
|
|
|
|
SSC-Addicted
      
Group: General Forum Members
Last Login: Today @ 10:34 AM
Points: 494,
Visits: 2,158
|
|
|
|
|