t-sql 2008 avoid a cartesian product

  • For a customer, I need to load data from excel 2010 spreadsheets into a sql server 2008 r2 database for a one time set of adhoc queries. I will then run queries to obtain to look at the data based upon the user requirements.

    The problem with the data is there is alot of duplicate data within different rows within the same table. When I join the various tables together, I get a cartesian product.

    I am trying to determine how to run the queries I need to run without getting the cartesian product.

    There are 3 tables which are: 1. a claims table by client number that has the duplicate data wtihin some of the rows. 2. a price table and 3. an authorizagtion table.

    I am thinking of loading the data into temp tables in a way that the data will not be duplicated. If this is a possbility, can you tell me how to accomplish this goal?

    If you have any other suggestions, can you show me code on how to accomplish this goal?

  • Without being able to see what you have to deal with, or at least a reasonable facsimile of the data, it is hard to provide you much guidance. This is a case where seeing will really help.

  • What would you want to see? The columns in the tables and/or the data?

  • Hope this code will work and if it satisfied with the ff requirements.

    1. You want to select distinct resultset.

    2. must have primary keys and foreign keys to join the three tables

    WITH CTE_DISTINCT(CLIENT_NUMBER)

    AS

    (SELECT DISTINCT CLIENT_NUMBER FROM CLIENT_TABLE)

    SELECT * FROM CTE_DISTINCT A

    INNER JOIN PRICE_TABLE B

    ON A.CLIENT_NUMBER = B.CLIENT_NUMBER

    INNER JOIN AUTHORIZATION_TABLE C

    ON B.CLIENT_NUMBER = C.CLIENT_NUMBER

    --WHERE <YOUR CONDITION HERE>

  • If the problem is avoiding duplicate data insertion in your table, it will be a diffrent story. You need to check first the data if exists in client table before inserting the data. It the data exists already, dont insert, move to another record.

  • There are 3 tables which are: 1. a claims table by client number that has the duplicate data wtihin some of the rows. 2. a price table and 3. an authorizagtion table.

    I am thinking of loading the data into temp tables in a way that the data will not be duplicated. If this is a possbility, can you tell me how to accomplish this goal?

    Why do you want to eliminate duplicates from the claims table??.....1 client can have multiple claims....right??.....the price and the authorization table just seem like master tables.

    The price and the authorization tables will have unique values. But, the claims table may have duplicates. Isn,t that how it is supposed to be??....Multiple claims for a client???

    Vinu Vijayan

    For better and faster solutions please check..."How to post data/code on a forum to get the best help" - Jeff Moden[/url] 😉

  • wendy elizabeth (12/24/2012)


    I need to load data from excel 2010 spreadsheets...I am thinking of loading the data into temp tables in a way that the data will not be duplicated.

    Wendy, if you truly want to eliminate the dupes, and I'm not suggesting this is what you *should* do, as was previously mentioned, why don't you just use Excel's de-dupe function? Click the "Data" tab on the ribbon, and look for "Remove Duplicates". Also, you said you are getting a cartesian product, so maybe you omitted some join criteria?

    Greg
    _________________________________________________________________________________________________
    The glass is at one half capacity: nothing more, nothing less.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply