Columnstore Query Reverting to Row Mode with CROSS JOIN

  • I have two inline selects against a table with a nonclustered columnstore on SQL 2014 (12.0.2000). Both execute in batch mode and when I inner-join the two, they continue to execute in batch mode. When I cross join them, one executes in row mode. Any ideas on how to avoid this? Below is some SQL to simulate the issue.

    Thanks,

    Mitch

    -- The purpose of this script is to demonstrate that

    -- two queries against a columnstore index that each execute in batch mode

    -- will continue to execute in batch mode when inner joined.

    -- However, one of the queries will execute in row mode when cross-joined.

    -- Create function to return 0 to n rows

    IF OBJECT_ID('dbo.IntCount') IS NOT NULL

    DROP FUNCTION dbo.IntCount;

    GO

    CREATE FUNCTION dbo.IntCount(@n AS BIGINT) RETURNS TABLE AS RETURN

    WITH

    L0 AS(SELECT 1 AS c UNION ALL SELECT 1),

    L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),

    L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),

    L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),

    L4 AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B),

    L5 AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B),

    Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS n FROM L5)

    SELECT TOP (@n) n FROM Nums ORDER BY n;

    GO

    -- Create Employee table

    IF OBJECT_ID('dbo.Employee') IS NOT NULL

    DROP TABLE dbo.Employee;

    GO

    CREATE TABLE dbo.Employee

    (

    EmployeeID INT,

    GroupID INT

    )

    -- Populate with 1000 employees in 250 groups of 4

    INSERT INTO

    dbo.Employee

    SELECT

    n AS EmployeeID,

    CAST((n - 1) / 4 AS INT) + 1 GroupID

    FROM

    dbo.IntCount(1000)

    CREATE CLUSTERED INDEX IX_Employee_CL ON dbo.Employee (EmployeeID)

    CREATE NONCLUSTERED INDEX IX_Employee_GroupID ON dbo.Employee (GroupID)

    -- Create EmployeeReview table

    IF OBJECT_ID('dbo.EmployeeReview') IS NOT NULL

    DROP TABLE dbo.EmployeeReview;

    GO

    CREATE TABLE dbo.EmployeeReview

    (

    ReviewDate DATE,

    EmployeeID INT,

    IsPositive INT

    )

    -- Populate with 1000000 employee reviews between Jan 1, 2015 and Jan 31, 2015

    -- A review can either be IsPositive = 0 or IsPositive = 1

    INSERT INTO

    dbo.EmployeeReview

    SELECT

    DATEADD(DAY, RAND(CHECKSUM(NEWID())) * 31, CAST('2015-01-01' AS DATE)) AS ReviewDate,

    (n - 1) % 1000 + 1 AS EmployeeID,

    ROUND(RAND(CHECKSUM(NEWID())) + RAND(CHECKSUM(NEWID())) / 3, 0) AS IsPositive

    FROM

    dbo.IntCount(1000000)

    -- Create Columnstore index on all EmployeeReview columns

    CREATE NONCLUSTERED COLUMNSTORE INDEX IX_EmployeeReview_CS ON dbo.EmployeeReview

    (

    ReviewDate,

    EmployeeID,

    IsPositive

    )

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    GO

    -- Rank Employees for each day by % of Positive reviews against a population of Groups

    -- This query's execution plan uses Batch Execution Mode for both subselects

    --

    -- SQL Server Execution Times:

    -- CPU time = 500 ms, elapsed time = 62 ms.

    SELECT

    Employees.ReviewDate,

    Employees.EmployeeID,

    Employees.[% of Positive],

    SUM(CASE WHEN Employees.[% of Positive] < Groups.[% of Positive] THEN 1 ELSE 0 END) + 1 [Rank]

    FROM

    (

    SELECT

    ReviewDate,

    EmployeeReview.EmployeeID,

    CAST(SUM(IsPositive) AS DECIMAL(19,5)) / CAST(SUM(1) AS DECIMAL(19,5)) AS [% of Positive]

    FROM

    dbo.EmployeeReview JOIN

    dbo.Employee ON EmployeeReview.EmployeeID = Employee.EmployeeID

    WHERE

    GroupID = 1

    GROUP BY

    ReviewDate,

    EmployeeReview.EmployeeID

    ) Employees

    JOIN

    (

    SELECT

    ReviewDate,

    GroupID,

    CAST(SUM(IsPositive) AS DECIMAL(19,5)) / CAST(SUM(1) AS DECIMAL(19,5)) AS [% of Positive]

    FROM

    dbo.EmployeeReview JOIN

    dbo.Employee ON EmployeeReview.EmployeeID = Employee.EmployeeID

    GROUP BY

    ReviewDate,

    GroupID

    ) Groups ON Employees.ReviewDate = Groups.ReviewDate

    GROUP BY

    Employees.ReviewDate,

    Employees.EmployeeID,

    Employees.[% of Positive]

    -- Rank Employees across the entire month by % of Positive reviews against a population of Groups

    -- This query's execution plan uses

    -- Batch Execution Mode for the first subselect BUT

    -- **Row** Execution Mode for the second subselect

    --

    -- SQL Server Execution Times:

    -- CPU time = 1329 ms, elapsed time = 466 ms.

    SELECT

    Employees.EmployeeID,

    Employees.[% of Positive],

    SUM(CASE WHEN Employees.[% of Positive] < Groups.[% of Positive] THEN 1 ELSE 0 END) + 1 [Rank]

    FROM

    (

    SELECT

    EmployeeReview.EmployeeID,

    CAST(SUM(IsPositive) AS DECIMAL(19,5)) / CAST(SUM(1) AS DECIMAL(19,5)) AS [% of Positive]

    FROM

    dbo.EmployeeReview JOIN

    dbo.Employee ON EmployeeReview.EmployeeID = Employee.EmployeeID

    WHERE

    GroupID = 1

    GROUP BY

    EmployeeReview.EmployeeID

    ) Employees

    CROSS JOIN

    (

    SELECT

    GroupID,

    CAST(SUM(IsPositive) AS DECIMAL(19,5)) / CAST(SUM(1) AS DECIMAL(19,5)) AS [% of Positive]

    FROM

    dbo.EmployeeReview JOIN

    dbo.Employee ON EmployeeReview.EmployeeID = Employee.EmployeeID

    GROUP BY

    GroupID

    ) Groups

    GROUP BY

    Employees.EmployeeID,

    Employees.[% of Positive]

    SET STATISTICS IO OFF

    SET STATISTICS TIME OFF

    GO

  • Check the execution plan. It really comes down to have it's being resolved. If you read the FAQ from Microsoft it outlines places where you can revert to row mode processing, all of which will be in the execution plan. Also, check the estimated cost of the plan. If it's below your Cost Threshold for Parallelism you're going to get a row mode execution.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • The removal of the join predicate when using the cross join creates an inner loop join, which is not support by batch mode processing. How would I be able to get the cross join to use an inner hash join or use some other plan that support batch mode? An INNER HASH JOIN hint fails even when I create a dummy static value to use as a join predicate.

  • I'm not sure. I don't have a good answer to that.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • If you feel confident that this particular query should always leverage the non-clustered columnstore index, then have you tried an index like so?

    WITH (INDEX (IX_EmployeeReview_CS))

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell (5/21/2015)


    If you feel confident that this particular query should always leverage the non-clustered columnstore index, then have you tried an index like so?

    WITH (INDEX (IX_EmployeeReview_CS))

    But the issue is if it goes to Batch mode or stays in Row mode. We're trying for batch mode here.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • I found a successful, albeit kludgy, workaround. Add a dummy column to EmployeeReview table populated with 1. A join on it behaves like a cross join, but is executed like an inner join. Not an ideal implementation, but it works.

    SELECT

    Employees.EmployeeID,

    Employees.[% of Positive],

    SUM(CASE WHEN Employees.[% of Positive] < Groups.[% of Positive] THEN 1 ELSE 0 END) + 1 [Rank]

    FROM

    (

    SELECT

    JoinKey,

    EmployeeReview.EmployeeID,

    CAST(SUM(IsPositive) AS DECIMAL(19,5)) / CAST(SUM(1) AS DECIMAL(19,5)) AS [% of Positive]

    FROM

    dbo.EmployeeReview JOIN

    dbo.Employee ON EmployeeReview.EmployeeID = Employee.EmployeeID

    WHERE

    GroupID = 1

    GROUP BY

    JoinKey,

    EmployeeReview.EmployeeID

    ) Employees

    JOIN

    (

    SELECT

    JoinKey,

    GroupID,

    CAST(SUM(IsPositive) AS DECIMAL(19,5)) / CAST(SUM(1) AS DECIMAL(19,5)) AS [% of Positive]

    FROM

    dbo.EmployeeReview JOIN

    dbo.Employee ON EmployeeReview.EmployeeID = Employee.EmployeeID

    GROUP BY

    JoinKey,

    GroupID

    ) Groups ON Employees.JoinKey = Groups.JoinKey

    GROUP BY

    Employees.EmployeeID,

    Employees.[% of Positive]

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply