SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


SSIS Lookup - Return Unique Matches Only


SSIS Lookup - Return Unique Matches Only

Author
Message
jvanderberg
jvanderberg
Say Hey Kid
Say Hey Kid (676 reputation)Say Hey Kid (676 reputation)Say Hey Kid (676 reputation)Say Hey Kid (676 reputation)Say Hey Kid (676 reputation)Say Hey Kid (676 reputation)Say Hey Kid (676 reputation)Say Hey Kid (676 reputation)

Group: General Forum Members
Points: 676 Visits: 746
I am validating a large database of addresses. I've already deployed a number of ways to validate city, state and zipcode against a lookup database. As a final step, I want to take entries with invalid and missing zip codes, and attempt to validate the city and state to see if there is only 1 valid zip code for that city/state combination. This has turned out to be a much more complex operation that I initially thought.

What I've come up with so far is pretty convoluted. Because I need to log off any non-matches, I have to multicast the input rows. I then have to sort the rows and merge with the lookup database. The result is multicasted, with one output going to an aggregate transformation that counts the rows grouped by the unique identifier. This is then merged with the original input rows. A conditional split separates the rows with only 1 result from the others. The rows with 0 or more than 1 match are sent of to a log. The good rows are then joined back with the lookup output to get the valid lookup zip code.

What seems like a simple transformation actually takes over 10 transformation steps with lines crossing each other at several points. It's virtually unreadable, will be difficult to maintain down the road, and is pretty slow.

Is there a better way to do this that I am not thinking of?
Evil Kraig F
Evil Kraig F
SSCrazy Eights
SSCrazy Eights (8.5K reputation)SSCrazy Eights (8.5K reputation)SSCrazy Eights (8.5K reputation)SSCrazy Eights (8.5K reputation)SSCrazy Eights (8.5K reputation)SSCrazy Eights (8.5K reputation)SSCrazy Eights (8.5K reputation)SSCrazy Eights (8.5K reputation)

Group: General Forum Members
Points: 8545 Visits: 7660
jvanderberg (2/6/2014)
I am validating a large database of addresses. I've already deployed a number of ways to validate city, state and zipcode against a lookup database. As a final step, I want to take entries with invalid and missing zip codes, and attempt to validate the city and state to see if there is only 1 valid zip code for that city/state combination. This has turned out to be a much more complex operation that I initially thought.


Excellent, thank you for helping me understand the use case for the logic you're applying. It'll help me get you untangled. I think you've just overthought it.

What I've come up with so far is pretty convoluted. Because I need to log off any non-matches, I have to multicast the input rows.

Nope. You need to use error record redirection. Hang on. Smile

I then have to sort the rows and merge with the lookup database.

Sort? Why are you sorting?

The result is multicasted, with one output going to an aggregate transformation that counts the rows grouped by the unique identifier.

Okay, I get why you might want this for a rowcount, but why is this re-merging?

This is then merged with the original input rows. A conditional split separates the rows with only 1 result from the others. The rows with 0 or more than 1 match are sent of to a log. The good rows are then joined back with the lookup output to get the valid lookup zip code.

What seems like a simple transformation actually takes over 10 transformation steps with lines crossing each other at several points. It's virtually unreadable, will be difficult to maintain down the road, and is pretty slow.

Is there a better way to do this that I am not thinking of?


... I'm not even sure where this was going anymore.

For the original use case, I would do the following:
Bring the records in from the source database you're checking into the stream. You need the key and the City/State/Zip Fields from the source, that should be it. Next, apply a lookup for valid C/S/Z fields on all three columns. Redirect error rows via Error Output and Lookup Match Output.

This leaves your 'good' records in an intact stream. Ignore this for now. What we want is the error path (red arrow). Now, feed this into another lookup. This lookup should be fed by a query that looks something like this:
SELECT DISTINCT
[City],
[State],
[Zip]
FROM
ValidCSZList AS v
JOIN
(SELECT
[City],
[State]
FROM
ValidCSZList
GROUP BY
[City],
[State]
HAVING
COUNT( DISTINCT [Zip]) = 1
) AS drv
ON v.[City] = drv.[City]
AND v.[State] = drv.[State]



That query will give you only city/state combinations with a single zip code.

Re-lookup the error stream with this data, hooking only off city/state. Output the Zip from the lookup along with the original data in the main stream. Again, Error control the lookup and redirect failures.

You now have 3 streams of data.

The first stream is good data. Not sure what you want to do with that other than ignore it, it needs no changes.

The second stream, the success off the second lookup, are invalid records in the source that have a valid correction that can be applied. You can feed this back into an SCD2 change control or a list of some kind to alter the origination records with the correct zip code. This is why you brought the key along from the origination system.

The third stream (the error stream from the second lookup) contains the records that have invalid C/S/Z combinations and cannot be matched up via C/S to a unique Zip. This needs to be logged off to a table somewhere for manual intervention of the source data. Most likely by calling the client/contact/whatever to get it corrected.

If you need to do more with this data from that point, work from there where you have the unique streams. You shouldn't need sorts of any kind, and the only aggregator in SSIS I recommend is COUNT(*) for a recordset total, which will avoid sorting.


- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions | Forum Netiquette
For index/tuning help, follow these directions. |Tally Tables

Twitter: @AnyWayDBA
jvanderberg
jvanderberg
Say Hey Kid
Say Hey Kid (676 reputation)Say Hey Kid (676 reputation)Say Hey Kid (676 reputation)Say Hey Kid (676 reputation)Say Hey Kid (676 reputation)Say Hey Kid (676 reputation)Say Hey Kid (676 reputation)Say Hey Kid (676 reputation)

Group: General Forum Members
Points: 676 Visits: 746
Thanks. That helped a lot. I won't try to justify the way I was going about it. I was obviously just thinking about it from the wrong angle. I needed another perspective. I thank you very much for yours.

The solution worked wonderfully.
Evil Kraig F
Evil Kraig F
SSCrazy Eights
SSCrazy Eights (8.5K reputation)SSCrazy Eights (8.5K reputation)SSCrazy Eights (8.5K reputation)SSCrazy Eights (8.5K reputation)SSCrazy Eights (8.5K reputation)SSCrazy Eights (8.5K reputation)SSCrazy Eights (8.5K reputation)SSCrazy Eights (8.5K reputation)

Group: General Forum Members
Points: 8545 Visits: 7660
My pleasure, we've all been there. Glad I could help.


- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions | Forum Netiquette
For index/tuning help, follow these directions. |Tally Tables

Twitter: @AnyWayDBA
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search