Min Max Quartile

  • Garadin (12/9/2008)


    Garadin (12/9/2008)


    Chris,

    Although it does seem like it might be more efficient due to the derived tables, the execution plans are virtually identical (once I add the aliases to your outer ProductID's). I actually had it this way and then went the other route. Check out this post from last week, particularly Jeff's post towards the bottom evaluating the 3 methods of doing this. (Left Join/NULL, Exists and IN)

    http://www.sqlservercentral.com/Forums/Topic611416-338-1.aspx

    Which would have probably been true if I didn't have outer references in my IN's. :crazy: After testing, Chris is absolutely right, try his out, with my tests on my Dev box, his version completes in 34 seconds for 1.4M rows.

    Seth, I've just worked on this again from scratch and came up with your original code which is correct, mine is broken.

    DECLARE @Products TABLE(

    ProductIdent INT IDENTITY(1,1),

    ProductID VARCHAR(10),

    UnitPrice money)

    INSERT INTO @Products(ProductID, UnitPrice)

    SELECT 'P1',0 UNION ALL

    SELECT 'P1',5 UNION ALL --

    SELECT 'P1',10 UNION ALL --

    SELECT 'P1',20 UNION ALL

    SELECT 'P2',80 UNION ALL

    SELECT 'P2',90 UNION ALL --

    SELECT 'P2',100 UNION ALL --

    SELECT 'P2',200

    SELECT p.*

    FROM @Products p

    WHERE ProductIdent NOT IN (SELECT TOP 25 PERCENT ProductIdent FROM @Products WHERE ProductID = p.ProductID ORDER BY UnitPrice)

    AND ProductIdent NOT IN (SELECT TOP 25 PERCENT ProductIdent FROM @Products WHERE ProductID = p.ProductID ORDER BY UnitPrice DESC)Problem is, as you say, it's the first and last 25% for each productID, which means of course that these quartiles have to be calculated for each productID. On a table of a million or more rows it's gonna hurt.

    I'm wondering if this can be done with "quirky update".

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • I've done this and got the right answer. It basically takes the last record from each query. So the max of price ascending is my min price and the min of price descending is my max. :hehe: :w00t: What do you guys think?

    DECLARE @Products TABLE(

    ProductID VARCHAR(10),

    MinOrMaxchar(1),

    UnitPrice float)

    INSERT INTO @Products(ProductID, MinOrMax, UnitPrice)

    SELECT TOP 25 PERCENT ProductID, 'X', UnitPrice

    FROM scheme.stockm s join SalesProducts P on s.warehouse='WG' and s.product=P.ProductID where ProductID='214700'

    ORDER BY UnitPrice asc

    INSERT INTO @Products(ProductID, MinOrMax, UnitPrice)

    SELECT TOP 25 PERCENT ProductID, 'I', UnitPrice

    FROM scheme.stockm s join SalesProducts P on s.warehouse='WG' and s.product=P.ProductID where ProductID='214700'

    ORDER BY UnitPrice desc

    select max (UnitPrice) from @Products where MinOrMax = 'X'

    select min (UnitPrice) from @Products where MinOrMax = 'I'

    select ProductID,

    MINSP=max (case when (MinOrMax = 'X') then UnitPrice end),

    MAXSP=min (case when (MinOrMax = 'I') then UnitPrice end)

    from @Products

    group by ProductID

  • Only one problem there Kelvin - you've restricted it to a single ProductID.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Excellent call Chris. The below shows an example of doing exactly that. The times are from my highly overloaded and underpowered dev box.

    [font="Courier New"]-------------------- SETUP --------------------------------

    CREATE TABLE #Product(

    ProductIdent       INT IDENTITY(1,1),

    ProductID          INT,

    UnitPrice          money,

    ProductSequence        INT)

    DECLARE @L INT

    SET @L = 1

    WHILE @L <=70 -- Takes about 17 Seconds

    BEGIN

    INSERT INTO #Product(ProductID, UnitPrice)

    SELECT RIGHT(CHECKSUM(NEWID()),3), ABS(CHECKSUM(NEWID())/100000) *.01

    FROM Tally -- 20K row tally table

    PRINT @L

    SET @L = @L + 1

    END -- Total of 1.4M Records

    CREATE  CLUSTERED  INDEX [CX_Product] --Takes About 40 Seconds

    ON #Product(ProductID, UnitPrice) ON [PRIMARY] -- This is an important part of the solution and is necessary

    --------------------- SOLUTION ----------------------------

    DECLARE @ProductSequence INT,

           @PrevProductID INT,

           @ProductIDent INT

          

    UPDATE #Product -- Takes about 29 Seconds

    SET @ProductSequence = ProductSequence = CASE WHEN ProductID = @PrevProductID THEN @ProductSequence + 1

                                               ELSE 1

                                               END,

       @PrevProductID = ProductID,

       @ProductIdent = ProductIdent

    FROM #Product WITH (INDEX(0))

    -- Grab the count for each Product

    SELECT ProductID, COUNT(ProductIdent) ProductCount

    INTO #ProdCount

    FROM #Product

    GROUP BY ProductID

    -- Final Query

    SELECT P.ProductID, MIN(UnitPrice) MinPrice, MAX(UnitPrice) MaxPrice

    FROM #Product P

       INNER JOIN #ProdCount PC ON P.ProductID = PC.ProductID

    WHERE ProductSequence BETWEEN PC.ProductCount/4 AND (PC.ProductCount - (PC.ProductCount/4))

    GROUP BY P.ProductID

    [/font]

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Guys, many many thanks for your help and guidance. I had realised I had only used one product ... I was showing you what I needed to do individually for EVERY product 😉

    Please see below my final solution. Again many many thanks.

    DECLARE @ProductCode varchar(20)

    DECLARE ProdList CURSOR FOR

    select distinct ProductID from SalesProducts

    DECLARE @Products TABLE(

    ProductID VARCHAR(10),

    MinOrMaxchar(1),

    UnitPrice float)

    drop table SalesProductsMinMax

    CREATE TABLE [dbo].[SalesProductsMinMax] (

    ProductID char(20),

    MinUnitPrice float,

    MaxUnitPrice float

    )

    OPEN ProdList;

    FETCH ProdList into @ProductCode;

    WHILE @@FETCH_STATUS = 0 BEGIN

    INSERT INTO @Products(ProductID, MinOrMax, UnitPrice)

    SELECT TOP 25 PERCENT ProductID, 'X', UnitPrice

    FROM scheme.stockm s join SalesProducts P on s.warehouse='WG' and s.product=P.ProductID where ProductID=@ProductCode

    ORDER BY UnitPrice asc

    INSERT INTO @Products(ProductID, MinOrMax, UnitPrice)

    SELECT TOP 25 PERCENT ProductID, 'I', UnitPrice

    FROM scheme.stockm s join SalesProducts P on s.warehouse='WG' and s.product=P.ProductID where ProductID=@ProductCode

    ORDER BY UnitPrice desc

    FETCH ProdList into @ProductCode;

    END;

    CLOSE ProdList;

    DEALLOCATE ProdList;

    insert into SalesProductsMinMax ( ProductID, MinUnitPrice, MaxUnitPrice)

    select ProductID,

    MINSP=max (case when (MinOrMax = 'X') then UnitPrice end),

    MAXSP=min (case when (MinOrMax = 'I') then UnitPrice end)

    from @Products

    group by ProductID

  • What is the performance of that cursor like? On my system, it ran for 7 minutes with the processor pegged at 100% before I stopped it (coming from my indexed temp table, and not using your joins, so mine should be faster)

    My final solution, which takes advantage of the lightning fast running totals method takes a little over 30 seconds to run.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Took about 1 minute to run. On SQL 2k 12gig memory 8gig assigned to SQL, 8 processors. ProductID was index on SalesProducts. I had 944 unique products.

    Now that I have worked out how to top and tail prices I need to now do the same to daily sales figures for products over the year so I can work out the minimum and safety level stock levels for our MRP. I have extracted the total sold grouped by day of the week, week number and month number. I may also have to split the year so I can have variable levels based on the season.

    Very interesting stuff what !!! 😎

    Thanks again

Viewing 7 posts - 16 through 21 (of 21 total)

You must be logged in to reply to this topic. Login to reply