Query just hangs!

  • Abu Dina (1/29/2014)


    ChrisM@Work (1/28/2014)


    Gail's suggestion using DENSE_RANK() is well worth a try when you get time.

    Is it?

    So what happens if I have two records with the same Organisationname but the address1 of the first record matches the address2 of the second?

    Is it possible to have multiple PARTITION BYs using the same rank?

    It's the method which is significant. As Gail pointed out, with no expected result set it's up to you to figure out the correct list of columns, and even the most appropriate function. You've mentioned PARTITION BY and I too think that ROW_NUMBER() could meet your requirements here. It's a completely different method to the self-join and will swap 20 gazillion index seeks for a huge spilling sort. One method will be cheaper than the other. I'm surprised you haven't investigated which might be the most appropriate in this case.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • GilaMonster (1/27/2014)


    Abu Dina (1/27/2014)


    It's really about finding duplicate entries within the record set. A duplicate in this case is at site/business level.

    Same company same address = duplicate

    No company but with same address = duplicate

    Order the result set by company name and the duplicates will seem obvious.

    In that case it should be as simple as:

    DENSE_RANK() OVER (Order By Organisationname, Address1, Address2, Address3, Address5, Town, County, Postcode, Country) AS SiteID

    Same number assigned for companies with the same name and address

    Is Organisationname the same as Company?

    Or is Company the Site that sent part of the data?

    You have a lot of things that appear similar, which could be the same, but maybe not.

    Fuzzy Lookup and confidence rating are different than Exact Match.

    And if this is information that is being sent by multiple sites daily, weekly monthly, etc. what you intend to do with the duplicates might also be a question.

    Is your intent to have them clean up what they are sending?

    Or invent your own cleansing routine?

Viewing 2 posts - 31 through 31 (of 31 total)

You must be logged in to reply to this topic. Login to reply