Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


SQL design discussion - help required


SQL design discussion - help required

Author
Message
Abu Dina
Abu Dina
Right there with Babe
Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)

Group: General Forum Members
Points: 723 Visits: 3323
I'm developing a process for performing data cleaning / dedeuplication. To make the process as flexible as possible I am using an XML configuration file to define the information for the source table to be cleaned.

Here is a sample of the XML file , this part maps the source data columns to the standard columns used by my code.

    <dataSource>

<tables>
<table name="SourceTable" uniqueRef="Master_ID" />
</tables>

         
<fieldMappings>
         
<fieldMapping DefaultField="FullName" databaseField="" />
<fieldMapping DefaultField="Prefix" databaseField="" />
<fieldMapping DefaultField="LastName" databaseField="Surname" />
<fieldMapping DefaultField="FirstNames" databaseField="Forename" />
<fieldMapping DefaultField="Initials" databaseField="" />
<fieldMapping DefaultField="Qualification" databaseField="Title" />
<fieldMapping DefaultField="Suffix" databaseField="" />
<fieldMapping DefaultField="Organization" databaseField="OrganisationName" />
<fieldMapping DefaultField="Department" databaseField="" />
<fieldMapping DefaultField="JobTitle" databaseField="" />
<fieldMapping DefaultField="Address1" databaseField="Address_Line1" />
<fieldMapping DefaultField="Address2" databaseField="Address_Line2" />

</fieldMappings>

</dataSource>




So this is saying my source table contains the following columns:

1) Master_ID
2) Forename
3) Surname
4) CompanyName

At this stage, I want to apply various cleaning routines and insert the output into the standard output table (this contains 20-30 columns which covers all the types of data that the code will potentially have to deal with.

This is the part I need some help with as I'm not sure how to dynamically create the insert statement of the output table (see below):


Can anyone suggest how solve this kind of problem?

Hope this makes sense and thanks of your help in advance!

---------------------------------------------------------


It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens

Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
Eugene Elutin
Eugene Elutin
Hall of Fame
Hall of Fame (3K reputation)Hall of Fame (3K reputation)Hall of Fame (3K reputation)Hall of Fame (3K reputation)Hall of Fame (3K reputation)Hall of Fame (3K reputation)Hall of Fame (3K reputation)Hall of Fame (3K reputation)

Group: General Forum Members
Points: 3040 Visits: 5478
Cannot see sample of XML or "output" in your post...

_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help
Abu Dina
Abu Dina
Right there with Babe
Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)

Group: General Forum Members
Points: 723 Visits: 3323
Okay can you see one of the pictures now?

---------------------------------------------------------


It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens

Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
Lowell
Lowell
SSChampion
SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)

Group: General Forum Members
Points: 14934 Visits: 38929
besides identifying the columns in the table, you need another attribute to identify what makes a collection of columns unique, especially if it's not the PK of the table and there's no unique constraint.;
for example,

<fieldMapping DefaultField="Lastname" databaseField="Surname" IsPartOfUniqueCriteria="true">



then your app can use Linq or whatever to group the data by the IsPartOfUniqueCriteria=true columns and look for duplicates;

Lowell

--
help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

Abu Dina
Abu Dina
Right there with Babe
Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)

Group: General Forum Members
Points: 723 Visits: 3323
Yes, I have this covered in another section of the XML file where the match keys are defined.

Okay can you guys see both pictures now? I wonder if this is why no has replaied until now!

Anyway, I've started buidling strings to generatet the select part of the final insert and then I will also generate strings for the CROSS apply part where I generate the required values to populate the staging table.

Something like this:

DECLARE @mkNameKeySELECTString NVARCHAR(100)
      DECLARE @mkName1SELECTString NVARCHAR(100)
      DECLARE @mkName2SELECTString NVARCHAR(100)
      DECLARE @mkName3SELECTString NVARCHAR(100)
      
      
      -- If we already have First and Last names then there is no need to split the names
      IF EXISTS (SELECT 1 FROM dbo.FieldMappings WHERE StagingColumn = 'FirstNames')
      AND EXISTS (SELECT 1 FROM dbo.FieldMappings WHERE StagingColumn = 'LastName')
         BEGIN
         
            SET @mkNameKeySELECTString = 'dbo.NYSIISPhoneticEncoder(dbo.GetLastWord(' + dbo.GetSourceColumnName('LastName') + ')) + LEFT(' + dbo.GetSourceColumnName('FirstNames') + ', 1)'
            SELECT @mkNameKeySELECTString as mkName1
            
            SET @mkName1SELECTString = 'dbo.NYSIISPhoneticEncoder(dbo.GetFirstWord(' + dbo.GetSourceColumnName('Firstnames') + '))'
            SELECT @mkName1SELECTString as mkName2
            
            SET @mkName2SELECTString = 'dbo.NYSIISPhoneticEncoder(dbo.GetSecondWord(' + dbo.GetSourceColumnName('FirstNames') + '))'
            SELECT @mkName2SELECTString as mkName3
            
         END   



---------------------------------------------------------


It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens

Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
Eugene Elutin
Eugene Elutin
Hall of Fame
Hall of Fame (3K reputation)Hall of Fame (3K reputation)Hall of Fame (3K reputation)Hall of Fame (3K reputation)Hall of Fame (3K reputation)Hall of Fame (3K reputation)Hall of Fame (3K reputation)Hall of Fame (3K reputation)

Group: General Forum Members
Points: 3040 Visits: 5478
I cannot see any picture... Crying

_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help
Abu Dina
Abu Dina
Right there with Babe
Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)Right there with Babe (723 reputation)

Group: General Forum Members
Points: 723 Visits: 3323
I've replaced the picture with XML text instead!

---------------------------------------------------------


It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens

Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search