Help needed for a twisted SQL query

  • Hi all,

    I need all the help I can get for a SQL query I just can't figure out that's making my head spin for days...

    Here comes an explanation, note that I have simplified the tables and fields names (and the number of them) to make it a little simpler.

    I have 3 tables: PRODUCTS, CITIES and OTM (for one-to-many)

    PRODUCTS contains prdID (numeric), prdName (varchar)

    CITIES contains postcodeID (numeric), postcode (numeric), city (varchar), latitude (numeric), longitude (numeric)

    OTM contains prdID (numeric), postcodeID (numeric)

    I also have a SQL scalar function called getDistanceFromID that accepts 3 parameters: postcodeID, latitude, longitude.

    What it does is query the CITIES table to retrieve the latitude and longitude of the city whose postcodeID is passed as parameter and calculate the distance between that city and the one whose latitude and longitude are passed as parameters.

    To give you an idea, the following query

    SELECT city, getDistanceFromID(555, latitude, longitude) AS [distance] FROM CITIES ORDER BY distance

    would return all the cities contained in the CITIES table and the distance from the city whose postcodeID is 555, ordered by ascending distance.

    This function works fine.

    This is where my problem starts:

    Each product can be purchased by a customer in different cities. That means that the OTM table contains one record for each product/city pair. If product X is available in 3 different cities, OTM will have 3 different records for that product, with the same prdID and a different postcodeID.

    What I want is build a query that will retrieve ONLY one record for each product. This ain't a problem but the problem is that:

    that record must return at least the followong fields: prdID, postcodeID, min_distance (where min_distance is the smallest distance between the customer's own city - which we know at all times - and the city the product in question can be found)

    Believe me, I have tried lots of things.

    - retrieve unique products (prdID) and the shortest distance -> no problem

    - retrieve cities (postcodeID) and the shortest distance -> no problem

    - retrieve unique products (prdID) AND the city it's in (postcodeID) AND the shortest distance -> [PULL HAIR]

    I've tried all sorts of sub-queries, GROUP By clauses and aggregate function to no avail until now.

    I hope I've been clear in the description of my problem and I really hope someone can help.

    Thanks in advance,

    Chris

  • Chris,

    Do you mean this:

    retrieve unique products (prdID) AND the city it's in (postcodeID) AND the shortest distance to a reference city

    Cheers

    ChrisM

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Chris Morris (12/5/2008)


    Chris,

    Do you mean this:

    retrieve unique products (prdID) AND the city it's in (postcodeID) AND the shortest distance to a reference city

    Cheers

    ChrisM

    Hi ChrisM,

    That's it exactly !!!

    Chris

  • Ok lets confirm this:

    Say your reference city is Boston, your requirement would be:

    For each and every individual product (which has been sold), list the nearest city to Boston where it's been sold.

    Is this correct?

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Chris Morris (12/5/2008)


    Ok lets confirm this:

    Say your reference city is Boston, your requirement would be:

    For each and every individual product (which has been sold), list the nearest city to Boston where it's been sold.

    Is this correct?

    It's correct.

    Where the query is concerned, I actually need (at least) the prdID, the postcodeID AND the distance

    Chris

  • chris (12/5/2008)


    Chris Morris (12/5/2008)


    Ok lets confirm this:

    Say your reference city is Boston, your requirement would be:

    For each and every individual product (which has been sold), list the nearest city to Boston where it's been sold.

    Is this correct?

    It's correct.

    Where the query is concerned, I actually need (at least) the prdID, the postcodeID AND the distance

    Chris

    Lovely stuff, thanks Chris. Last question - can you confirm that you're using SQL Server 2000 and not 2005?

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • To give you an idea, I tried this subquery to retrieve an individual product and the distance to the nearest location. It works great as is but as you can imagine, the GROUP BY clause prevents me from also retrieving the postcodeID that I need as well.

    Note: the 555 value in the getDistance function is the customer's postcodeID; we know this value all along because it's stored in a cookie.

    SELECT prdID, MIN(getDistance(555, POSTCODES.latitude, POSTCODES.longitude)) AS [Distance]

    FROM OTM LEFT JOIN POSTCODES

    ON OTM.postcodeID = POSTCODES.postcodeID

    GROUP BY prdID

    Chris

  • Chris Morris (12/5/2008)


    Lovely stuff, thanks Chris. Last question - can you confirm that you're using SQL Server 2000 and not 2005?

    Yes unfortunately.

    I've heard of the OVER PARTITION option that's available in MSSQL 2005 but I do have to work with MSSQL 2000 for the time being :doze:

  • Yep your absolutely right, this type of query is far more elegant in 2005.

    Here's a query which will work.

    [font="Courier New"]DECLARE @Boston_prdID (numeric)

    SET @Boston_prdID = 555

    SELECT p.*, c.*

    FROM PRODUCTS p

    INNER JOIN OTM o ON o.prdID = o.prdID

    INNER JOIN CITIES c ON c.postcodeID = o.postcodeID

    INNER JOIN (

       SELECT pd.prdID, MIN(dbo.getDistanceFromID(@Boston_prdID, cd.latitude, cd.longitude)) AS [DistanceToBoston]

       FROM PRODUCTS pd

       INNER JOIN OTM od ON od.prdID = od.prdID

       INNER JOIN CITIES cd ON cd.postcodeID = od.postcodeID

       GROUP BY pd.prdID

    ) d ON d.prdID = p.prdID AND d.[DistanceToBoston] = dbo.getDistanceFromID(@Boston_prdID, c.latitude, c.longitude)

    [/font]

    I hate it, hopefully someone will come up with something more elegant, but at least it will get you started.

    Cheers

    ChrisM

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Chris Morris (12/5/2008)


    Here's a query which will work.

    [font="Courier New"]DECLARE @Boston_prdID (numeric)

    SET @Boston_prdID = 555

    SELECT p.*, c.*

    FROM PRODUCTS p

    INNER JOIN OTM o ON o.prdID = o.prdID

    INNER JOIN CITIES c ON c.postcodeID = o.postcodeID

    INNER JOIN (

       SELECT pd.prdID, MIN(dbo.getDistanceFromID(@Boston_prdID, cd.latitude, cd.longitude)) AS [DistanceToBoston]

       FROM PRODUCTS pd

       INNER JOIN OTM od ON od.prdID = od.prdID

       INNER JOIN CITIES cd ON cd.postcodeID = od.postcodeID

       GROUP BY pd.prdID

    ) d ON d.prdID = p.prdID AND d.[DistanceToBoston] = dbo.getDistanceFromID(@Boston_prdID, c.latitude, c.longitude)

    [/font]

    Hi ChrisM,

    I'm afraid there's a twist: I do indeed get unique products BUT they all map the same city, that of the first product returned :crying:

    Chris

  • Hi Chris

    There's an error in the JOIN, which is corrected here:

    DECLARE @Boston_prdID (numeric)

    SET @Boston_prdID = 555

    SELECT p.*, c.*, dbo.getDistanceFromID(@Boston_prdID, c.latitude, c.longitude) AS [DistanceToBoston], d.DistanceToBoston AS [MinDistanceToBoston]

    FROM PRODUCTS p

    INNER JOIN OTM o ON o.prdID = p.prdID

    INNER JOIN CITIES c ON c.postcodeID = o.postcodeID

    LEFT JOIN (SELECT pd.prdID, MIN(dbo.getDistanceFromID(@Boston_prdID, cd.latitude, cd.longitude)) AS [DistanceToBoston]

    FROM PRODUCTS pd

    INNER JOIN OTM od ON od.prdID = pd.prdID

    INNER JOIN CITIES cd ON cd.postcodeID = od.postcodeID

    GROUP BY pd.prdID

    ) d ON d.prdID = p.prdID AND d.[DistanceToBoston] = dbo.getDistanceFromID(@Boston_prdID, c.latitude, c.longitude)

    ORDER BY p.prdID, d.DistanceToBoston

    I've used a LEFT JOIN here so you can see the closest city and other cities (rejected rows in the original) in the output.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Chris Morris (12/5/2008)


    I've used a LEFT JOIN here so you can see the closest city and other cities (rejected rows in the original) in the output.

    Chris,

    Just read the Best practices, thanks, though I'm not sure how I can make it easier for you to help me.

    Please don't hesitate to tell me...

    I'm afraid your second query has a problem as well.

    Now I get results where all postcodeID's are used for all products 😉

    I'll now try and post a script that will create test tables and their content to make it easier to understand and to test.

    I really appreciate your help, Chris

    Chris

  • I'm afraid your second query has a problem as well.

    Now I get results where all postcodeID's are used for all products

    That's the intention, sometimes a quick eyeball is all you need.

    If you look at the two rightmost columns, one of them is always poulated, the other is only populated when that row has the minimum distance from city to reference city.

    To restrict this query so that it provides the results you want, change the join to the derived table from a LEFT JOIN to an INNER JOIN like this:

    DECLARE @Boston_prdID (numeric)

    SET @Boston_prdID = 555

    SELECT p.*, c.*, dbo.getDistanceFromID(@Boston_prdID, c.latitude, c.longitude) AS [DistanceToBoston], d.DistanceToBoston AS [MinDistanceToBoston]

    FROM PRODUCTS p

    INNER JOIN OTM o ON o.prdID = p.prdID

    INNER JOIN CITIES c ON c.postcodeID = o.postcodeID

    INNER JOIN (SELECT pd.prdID, MIN(dbo.getDistanceFromID(@Boston_prdID, cd.latitude, cd.longitude)) AS [DistanceToBoston]

    FROM PRODUCTS pd

    INNER JOIN OTM od ON od.prdID = pd.prdID

    INNER JOIN CITIES cd ON cd.postcodeID = od.postcodeID

    GROUP BY pd.prdID

    ) d ON d.prdID = p.prdID AND d.[DistanceToBoston] = dbo.getDistanceFromID(@Boston_prdID, c.latitude, c.longitude)

    ORDER BY p.prdID, d.DistanceToBoston

    Some sample data would be just grand!

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Chris Morris (12/5/2008)


    Some sample data would be just grand!

    here you go 🙂

    --creation of the getDistanceFromID scalar function

    CREATE FUNCTION [dbo].[getDistanceFromID]

    (

    @localID numeric, @localX float, @localY float

    )

    RETURNS float

    AS

    BEGIN

    DECLARE @i float

    DECLARE @lX float, @lY float

    SELECT @lX = lambertX, @lY = lambertY FROM CITIES WHERE postcodeID = @localID

    SET @i = SQRT(((@lX - @localX)*(@lX - @localX)) + ((@lY - @localY)*(@lY - @localY)))

    SET @i = ROUND(@i, 1)

    RETURN @i

    END

    --creation of the PRODUCTS TABLE

    DROP TABLE PRODUCTS

    CREATE TABLE PRODUCTS(

    [ID] [numeric](18, 0) IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,

    [prdID] [numeric](18, 0) NOT NULL,

    [prdName] [varchar](50) NOT NULL

    )

    --creation of the CITIES TABLE

    DROP TABLE CITIES

    CREATE TABLE CITIES(

    [ID] [numeric](18, 0) IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,

    [postcodeID] [numeric](18, 0) NOT NULL,

    [postcode] [numeric](18, 0) NOT NULL,

    [city] [varchar](50) NOT NULL,

    [latitude] [numeric](18, 0) NOT NULL,

    [longitude] [numeric](18, 0) NOT NULL

    )

    --creation of the OTM TABLE

    DROP TABLE OTM

    CREATE TABLE OTM(

    [prdID] [numeric](18, 0) NOT NULL,

    [postcodeID] [numeric](18, 0) NOT NULL

    )

    --populate PRODUCTS TABLE

    INSERT INTO PRODUCTS VALUES(1, 'airplane')

    INSERT INTO PRODUCTS VALUES(2, 'speedboat')

    INSERT INTO PRODUCTS VALUES(3, 'bicycle')

    --populate CITIES TABLE

    INSERT INTO CITIES VALUES(1, 1040, 'Madrid', 159, 183)

    INSERT INTO CITIES VALUES(2, 1800, 'Rome', 209, 159)

    INSERT INTO CITIES VALUES(3, 4500, 'Madrid', 208, 133)

    INSERT INTO CITIES VALUES(4, 6270, 'Firenze', 233, 88)

    INSERT INTO CITIES VALUES(5, 9070, 'Brussels', 126, 181)

    --populate OTM TABLE

    INSERT INTO OTM VALUES(1, 2)

    INSERT INTO OTM VALUES(1, 5)

    INSERT INTO OTM VALUES(2, 1)

    INSERT INTO OTM VALUES(3, 1)

    INSERT INTO OTM VALUES(3, 3)

    INSERT INTO OTM VALUES(3, 5)

  • SELECT @lX = lambertX, @lY = lambertY FROM CITIES WHERE postcodeID = @localID

    The definition fior the cities table doesn't include lambertX and lambertY

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply