• EdgeOfKnowWare (2/22/2016)


    Alan, I agree with you, if you're just wanting one ranking. If we expand upon the Original Posters issue (which is what I was facing) and say that you want to different groupings, how would you handle that?

    So, let's take the example and add a column.

    --NTILE (and others, presumably) gotcha: NULL inputs are included as belonging to the bottom tile,

    --displacing other values to higher tiles.

    IF OBJECT_ID('tempdb.dbo.#t', 'U') IS NOT NULL DROP TABLE #t;

    CREATE TABLE #t

    (

    ID INT IDENTITY(1,1) PRIMARY KEY,

    val INT NULL,

    val2 INT NULL

    );

    --INSERT 10 values: 3 NULL, 7 integers

    INSERT INTO #t SELECT NULL,1;

    INSERT INTO #t SELECT NULL,2;

    INSERT INTO #t SELECT NULL,3;

    INSERT INTO #t SELECT 5,4;

    INSERT INTO #t SELECT 7,5;

    INSERT INTO #t SELECT 8,6;

    INSERT INTO #t SELECT 9,NULL;

    INSERT INTO #t SELECT 9,NULL;

    INSERT INTO #t SELECT 11,NULL;

    INSERT INTO #t SELECT 12,NULL;

    --Compute quartiles.

    --With 7 non-NULL values, this should have 2-2-2-1 rows per quartile and the NULLs ignored (to my thinking).

    --Instead, we get back tiles based on 10 rows of input, with 3-3-2-2 rows per quartile.

    --The lowest real values of 5 and 7 are in Q2, not Q1, b/c the 3 NULLs are all the rows in Q1:

    Suppose you want the results of this:

    IDvalQval2Q2

    1NULLNULL11

    2NULLNULL21

    3NULLNULL32

    45142

    57153

    68264

    792NULLNULL

    893NULLNULL

    9113NULLNULL

    10124NULLNULL

    How would you write the query then? That's the direction I was going.

    For something like that I would do this:

    WITH

    Q1 AS

    (

    SELECT

    ID,

    val,

    Q = NULL,

    val2

    FROM #t

    WHERE val IS NULL

    UNION ALL

    SELECT

    ID,

    val,

    Q = NTILE(4) OVER (ORDER BY val),

    val2

    FROM #t

    WHERE val IS NOT NULL

    ),

    Q2 AS

    (

    SELECT ID, val2, Q2 = NTILE(4) OVER (ORDER BY val2)

    FROM Q1

    WHERE val2 IS NOT NULL

    )

    SELECT

    Q1.ID,

    val,

    Q,

    Q1.val2,

    Q2

    FROM Q1

    LEFT JOIN Q2 ON Q1.ID = Q2.ID;

    If you went that route you would really want this index:

    CREATE NONCLUSTERED INDEX nc_t_val1_temp ON #t(val, val2) INCLUDE (ID);

    and would be able to further improve performance using this index:

    CREATE NONCLUSTERED INDEX nc_t_val2_temp ON #t(val2) INCLUDE (ID, val);

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001