Multiple parameters for a query

  • I receive a flat file which I am importing with SSIS. The problem is the file does not contain a needed field(ssn). I have to query another table to find the ssn information. As an example, I receive a file with about 250 records. These records have firstname, lastname, and dob. I need to use this information to query the other table to find the SSN and add that to a new table. So for each record I must find the corresponding SSN located in the other table. Does anyone have a good idea about how to approach this?

    Thanks in advance.

  • I think you'll want to use the SSIS Lookup Transformation. There are many ways yo configure it to suit your scenario. Most considerations surround the number of records in the incoming file and the number of rows in the lookup table. Post here, or post a new topic if you have more questions related to setting it up.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • apache626 (9/7/2012)


    I receive a flat file which I am importing with SSIS. The problem is the file does not contain a needed field(ssn). I have to query another table to find the ssn information. As an example, I receive a file with about 250 records. These records have firstname, lastname, and dob. I need to use this information to query the other table to find the SSN and add that to a new table. So for each record I must find the corresponding SSN located in the other table. Does anyone have a good idea about how to approach this?

    Thanks in advance.

    I agree with OPC on this one, the lookup is the component you'll want for SSIS... however, this is inverted to expected row counts (something he mentioned). Lookup works best at many inbound rows to few checked, not few inbound rows to billions checked... because it has no indexing.

    My recommendation with this few records being imported would be to complete your import and then immediately run a standard update T-SQL script against the data to review the index on your main table to add the SSN to the information. Without an index this will be a horribly painful operation either way. Make sure you've got a nonclustered index on the lookup table on DoB, First, Last and INCLUDE the SSN column so you have the tightest lookup available for your update.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Evil Kraig F (9/7/2012)


    apache626 (9/7/2012)


    I receive a flat file which I am importing with SSIS. The problem is the file does not contain a needed field(ssn). I have to query another table to find the ssn information. As an example, I receive a file with about 250 records. These records have firstname, lastname, and dob. I need to use this information to query the other table to find the SSN and add that to a new table. So for each record I must find the corresponding SSN located in the other table. Does anyone have a good idea about how to approach this?

    Thanks in advance.

    I agree with OPC on this one, the lookup is the component you'll want for SSIS... however, this is inverted to expected row counts (something he mentioned). Lookup works best at many inbound rows to few checked, not few inbound rows to billions checked... because it has no indexing.

    My recommendation with this few records being imported would be to complete your import and then immediately run a standard update T-SQL script against the data to review the index on your main table to add the SSN to the information.

    I agree, if the destination of your incoming data and the location of the SSN are on the same instance. My pointer towards using the Lookup Transformation assumed the destination of your incoming data and the location of the SSN were on different instances, i.e. not reachable via T-SQL and excluding Linked Servers.

    Without an index this will be a horribly painful operation either way. Make sure you've got a nonclustered index on the lookup table on DoB, First, Last and INCLUDE the SSN column so you have the tightest lookup available for your update.

    If the data is on separate instances (i.e. proceeding with the Lookup Transformation) and you'll be searching a ton of data when doing the SSN lookup then I recommend disabling cache altogether and letting SSIS do the 250 sequential lookups. With that few lookups to do there is no sense in caching any of the table that carries the SSN if that table is large. Having an index on the SSN table as Craig mentioned to keep lookup time to a minimum will be critical to the overall performance of the Transformation.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • apache626 (9/7/2012)


    I receive a flat file which I am importing with SSIS. The problem is the file does not contain a needed field(ssn). I have to query another table to find the ssn information. As an example, I receive a file with about 250 records. These records have firstname, lastname, and dob. I need to use this information to query the other table to find the SSN and add that to a new table. So for each record I must find the corresponding SSN located in the other table. Does anyone have a good idea about how to approach this?

    Thanks in advance.

    Just curious... what company are you working for?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I think the issue that I'm having is how to use multiple sets of parameters. For example in C# i would have an array and say foreach "set of firstname, lastname and dob" select the corresponding ssn and then i would put the results in an array. I'm not understanding how to say this in tsql. To answer your question Jeff a hospital system based out of California.

    Thanks

  • In SQL an array is modeled as a Table.

    Parameters to a stored procedure are more like arguments to a method in C#, however there is no concept of overloading in T-SQL.

    Starting with SQL 2008 you can pass tables to a stored procedure as a parameter, lookup Table-Valued Parameters for more info there.

    You never told us whether the destination of your flat file and the SSN data are on the same instance, or different instances.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • I just realized you posted in a SQL 2005 forum for which Table Valued Parameters (TVPs) are not an option.

    Some options for passing a table (i.e. an array) to a stored procedure as a parameter in SQL 2005 are:

    - pass an xml document parse it as a table in your stored procedure

    - pass a delimited string parse it as a table in your stored procedure

    I have used both of these methods myself and have passed it along to a lot of other people on these forums. Here is the most comprehensive research I have found to date on the topic tailored specific for SQL 2005:

    Arrays and Lists in SQL Server 2005 and Beyond by Erland Sommarskog

    Here also is the updated version for 2008 which details TVPs:

    Arrays and Lists in SQL Server 2008 by Erland Sommarskog

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • apache626 (9/10/2012)


    To answer your question Jeff a hospital system based out of California.

    Then, depending onn how the SSN's are stored, you may have a much larger problem than you think. From the sounds of it, the SSN's are stored as clear text. In other words, not encrypted and, if it's not against the law, it should be. I know for sure that it violates PCI and would never pass such an audit. My recommendation would be to alert management to the problem of having SSNs stored as clear text.

    As for matchinng SSNs by name and DOB, it's nearly impossible to do such a thing with 100% accuracy even if a human gets involved because there are so many permutations as to how a name may appear not to mention the fact that people with the same name can also have the same DOB. That, notwithstanding, the first place to start would simply be to join the new table with the existing table based on the first name, last name, and DOB and use that join to update the new table.

    Going back to the SSN problem, it would be far better to assign a "customer number" or some such instead of perpetuating the problem of having SSNs in clear text.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (9/10/2012)


    the first place to start would simply be to join the new table with the existing table based on the first name, last name, and DOB and use that join to update the new table.

    I agree, assuming the SSN data is on the same instance where the flat-file is headed.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply