SQLServerCentral Article

What Power BI DirectQuery does to your SQL Server (and how to fix it)

The ticket said "Power BI dashboards are slow - please investigate." It had no report names, no screenshots and no timestamps. If you're a DBA and have been for more than five minutes, you'll know exactly what I'm talking about.

So I opened Activity Monitor and found what I expected. Endless ad hoc queries I'd never written, chewing through the Sales fact table with nested subqueries three levels deep. All from Power BI, running in DirectQuery mode. Activity Monitor told me enough to know where to look - after that I switched straight to DMVs and Query Store to get the real picture.

DirectQuery is a perfectly valid architecture choice, but it shifts the entire analytical workload onto your SQL Server, and if nobody warned you it was coming, you're going to have a rough afternoon. I've spent a large part of last few years tuning SQL Server instances that feed DirectQuery reports, and this article is everything I wish someone had told me before that first ticket landed.

I'll show you what DirectQuery actually does to your server, how to find the worst offending queries, and what you can do from the database side to make it all bearable. I'll also be honest about when DirectQuery is simply the wrong answer and you should steer them away from it.

How DirectQuery hammers your server

When a Power BI report uses Import mode, data sits compressed in Power BI's in-memory VertiPaq engine. Your SQL Server is only involved during scheduled refreshes. DirectQuery is the opposite. Every time someone opens a report, moves a slicer, or clicks a bar in a chart, Power BI translates the request into T-SQL and fires it at your database. There's no in-memory columnar cache like Import mode, and every interaction can generate live queries against your server.

The bit that catches most DBAs off guard is the multiplication effect. Each visual on a report page generates its own query. A dashboard with 12 visuals fires 12 separate queries on page load. When a user adjusts a date slicer, Power BI refreshes every visual on the page - that's 12 more queries. If cross-filtering is on (and it usually is), clicking one bar in a chart can trigger another round for every other visual.

The maths gets scary fast. One executive opens their dashboard: 12 queries. They change the date range: 12 more. They click a region: up to 12 more. That's 36 queries from a single user in under 30 seconds. Now multiply that by 50 users at 9am on a Monday before anyone's even got the coffee order in.

Power BI imposes its own constraints on top of this:

  • Query timeout: There are multiple limits depending on where the report runs. The visual query limit is 225 seconds in both shared and Premium capacities, and the DirectQuery service timeout is 4 minutes. Either way, if your server can't answer in time, the visual shows an error.
  • Concurrency: By default, Power BI opens a maximum of 10 concurrent connections per data source. Anything beyond that gets queued.
  • Row limit: 1,000,000 rows for intermediate results. You'll spot this as `TOP (1000001)` in the generated SQL - the extra row is Power BI's overflow detector.

One counterintuitive finding is that increasing the concurrency limit doesn't always help. In testing with 25 card visuals, setting concurrency to 30 actually performed worse than the default of 10. The database couldn't cope with that many simultaneous analytical queries, so individual query times ballooned past the timeout. Sometimes the queue is doing you a favour.

The SQL that Power BI writes

If you've ever captured DirectQuery traffic with Profiler or Extended Events, you'll have noticed the T-SQL looks nothing like what a human would write. Power BI's query generation wraps every table reference in a derived table, adds CAST operations everywhere, and produces queries that are technically correct but wildly verbose.

Here's what you'd write for "total sales by category, filtered to 2024":

sql
SELECT
    p.ProductCategory,
    SUM(s.SalesAmount) AS TotalSales
FROM dbo.Sales s
INNER JOIN dbo.Product p ON s.ProductID = p.ProductID
WHERE s.SalesDate >= '2024-01-01' AND s.SalesDate < '2025-01-01'
GROUP BY p.ProductCategory
ORDER BY TotalSales DESC;

And here's a representative example of what Power BI sends (the exact shape varies with your model, relationships, and measures, but this pattern is typical):

sql
SELECT TOP (1000001)
    [t2].[ProductCategory],
    SUM(CAST([t1].[SalesAmount] AS FLOAT)) AS [a0]
FROM
(
    SELECT [Sales].[SalesAmount], [Sales].[Quantity],
           [Sales].[ProductID], [Sales].[SalesDate]
    FROM [dbo].[Sales] AS [Sales]
) AS [t1]
INNER JOIN
(
    SELECT [Product].[ProductID], [Product].[ProductCategory]
    FROM [dbo].[Product] AS [Product]
) AS [t2]
ON [t1].[ProductID] = [t2].[ProductID]
WHERE YEAR(CAST([t1].[SalesDate] AS DATE)) = 2024
GROUP BY [t2].[ProductCategory]
ORDER BY [a0] DESC

Several things jump out here. First, the unnecessary subqueries wrapping each table (the optimiser can usually flatten these, but the sheer volume of distinct ad hoc queries puts real pressure on your plan cache). Then the CAST operations that weren't needed. And what hurts most: `WHERE YEAR(CAST([t1].[SalesDate] AS DATE)) = 2024`. Wrapping the column in a function makes the predicate non-sargable, so your index on SalesDate often can't help. You frequently end up scanning far more data than you'd expect.

A card visual showing a single number is no better because you still get the subquery wrapping, the CASTs, and if the card aggregates a large table, it's a scan for one value.

Another thing you'll notice is that Power BI often generates LEFT OUTER JOINs, especially when Assume Referential Integrity isn't enabled on the model relationships. It does this to handle the "blank" row in DAX. If you know your foreign keys are clean, tell your BI team to set 'Assume Referential Integrity' to true on the model relationships. This can switch the generated SQL to INNER JOINs, which gives the optimiser more join strategies to work with. It's a free win that needs no database changes.

Finding the problem queries

Before you fix anything, you need to know exactly what Power BI is sending. I use three tools, depending on the urgency.

When the server is on fire right now

This DMV query shows active requests that match the DirectQuery signature - nested subqueries and the `TOP (1000001)` clause:

sql
SELECT
    r.session_id,
    r.total_elapsed_time AS elapsed_ms,
    r.status,
    r.wait_type,
    r.cpu_time AS cpu_ms,
    r.logical_reads,
    SUBSTRING(st.text, (r.statement_start_offset / 2) + 1,
        ((CASE r.statement_end_offset
            WHEN -1 THEN DATALENGTH(st.text)
            ELSE r.statement_end_offset
        END - r.statement_start_offset) / 2) + 1
    ) AS query_text
FROM sys.dm_exec_requests r
INNER JOIN sys.dm_exec_sessions s ON r.session_id = s.session_id
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) AS st
WHERE
    s.is_user_process = 1
    AND (
        SUBSTRING(st.text, 1, 1000) LIKE '%FROM (%SELECT%'
        OR SUBSTRING(st.text, 1, 1000) LIKE '%TOP (1000001)%'
    )
ORDER BY r.total_elapsed_time DESC;

Also check `s.program_name`, `s.host_name`, and `s.login_name` to identify Power BI traffic. What shows up varies across Desktop, the gateway, and the Power BI Service, so run a quick `SELECT DISTINCT program_name, host_name, login_name FROM sys.dm_exec_sessions WHERE is_user_process = 1` first to see what your environment actually looks like, then filter based on that.

For historical analysis: Query Store

This is where Query Store really earns its keep. It finds the most expensive DirectQuery-shaped queries over the last seven days:

sql
-- Note: Query Store stores durations in microseconds, so /1000 gives milliseconds
DECLARE @start_time DATETIME2 = DATEADD(DAY, -7, GETDATE()); SELECT TOP 20
    q.query_id,
    SUBSTRING(qt.query_sql_text, 1, 150) AS query_preview,
    SUM(rs.count_executions) AS total_execs,
    CAST(AVG(rs.avg_duration) / 1000.0 AS NUMERIC(10, 2)) AS avg_duration_ms,
    CAST(MAX(rs.max_duration) / 1000.0 AS NUMERIC(10, 2)) AS max_duration_ms,
    SUM(CAST(rs.avg_logical_io_reads * rs.count_executions AS BIGINT))
        AS total_logical_reads,
    COUNT(DISTINCT p.plan_id) AS plan_count
FROM sys.query_store_query_text qt
INNER JOIN sys.query_store_query q
    ON qt.query_text_id = q.query_text_id
INNER JOIN sys.query_store_plan p
    ON q.query_id = p.query_id
INNER JOIN sys.query_store_runtime_stats rs
    ON p.plan_id = rs.plan_id
INNER JOIN sys.query_store_runtime_stats_interval rsi
    ON rs.runtime_stats_interval_id = rsi.runtime_stats_interval_id
WHERE
    (qt.query_sql_text LIKE '%FROM (%SELECT%'
     OR qt.query_sql_text LIKE '%TOP (1000001)%')
    AND q.is_internal_query = 0
    AND q.object_id = 0
    AND rsi.start_time >= @start_time
GROUP BY q.query_id, qt.query_sql_text
HAVING SUM(rs.count_executions) > 10
ORDER BY SUM(rs.avg_duration * rs.count_executions) DESC;

Note: pay attention to `plan_count`. If a single query has multiple plans, you've got plan instability which is all too often the root cause of "it was fine yesterday, it's slow today." Query Store lets you force a known-good plan while you investigate, but don't leave forced plans in place forever. They're a bandage, not a fix.

For ongoing monitoring: Extended Events

Set up a lightweight session that captures long-running statements. Filter by database and duration to keep overhead down:

sql
-- Adjust the file path to a location SQL Server can write to
CREATE EVENT SESSION [DirectQuery_Monitor] ON SERVER
ADD EVENT sqlserver.rpc_completed (
    ACTION (
        sqlserver.client_app_name,
        sqlserver.database_name,
        sqlserver.sql_text,
        sqlserver.session_id
    )
    WHERE
        sqlserver.database_name = N'YourDatabase'
        AND duration >= 5000000  -- 5 seconds, in microseconds
),
ADD EVENT sqlserver.sql_batch_completed (
    ACTION (
        sqlserver.client_app_name,
        sqlserver.database_name,
        sqlserver.sql_text,
        sqlserver.session_id
    )
    WHERE
        sqlserver.database_name = N'YourDatabase'
        AND duration >= 5000000
)
ADD TARGET package0.event_file (
    SET filename = N'D:\XEvents\DirectQuery_Monitor.xel',
        max_file_size = 100,
        max_rollover_files = 5
)
WITH (MAX_DISPATCH_LATENCY = 30 SECONDS, STARTUP_STATE = OFF);
GO ALTER EVENT SESSION [DirectQuery_Monitor] ON SERVER STATE = START;

You want both rpc_completed and sql_batch_completed here. DirectQuery traffic usually comes through as RPC calls, but you'll miss things if you only capture one type.

When you review execution plans for these queries, look for: table scans on your fact tables (missing indexes or non-sargable predicates), hash joins where you'd expect nested loops (check statistics and join column indexes), key lookups (add INCLUDE columns to your nonclustered indexes), and sort spills to tempdb (memory grant issues, often from bad cardinality estimates).

Fixing it from the database side

Once you've identified the problem queries, now here is what I'd do first.

Columnstore indexes: the obvious first move

DirectQuery workloads are analytical by nature - lots of GROUP BY, SUM, COUNT, and scans across large fact tables. That's exactly what columnstore indexes are built for. In my experience, adding a nonclustered columnstore to a big fact table is the single most impactful change you can make for DirectQuery performance.

sql
CREATE NONCLUSTERED COLUMNSTORE INDEX NCCI_Sales_Analytics
ON dbo.Sales (SalesDate, ProductID, RegionID, CustomerID,
              SalesAmount, Quantity, Cost);

I use nonclustered columnstore (not clustered) when the table also handles OLTP writes. The primary B-tree indexes stay intact for your transactional workload, while DirectQuery queries get batch mode processing and columnar storage for their scans and aggregations.

The numbers honestly speak for themselves. On the fact table from our original ticket, query execution times dropped from 8-12 seconds to under a second. Logical reads went from 15,000+ to around 500. Not a rounding error - columnstore really is that effective for analytical scan patterns.

If the table also takes heavy writes, test the overhead first. Delta store pressure, tuple mover activity, and index maintenance all cost something. But for tables that are mostly read by reporting, it's a straightforward win.

If you're on SQL Server 2019 or later, check your compatibility level is 150 or higher. Batch mode on rowstore (part of Intelligent Query Processing) can accelerate analytical queries even without a columnstore index, but you get the best results with both.

Indexed views for pre-aggregation

When Power BI asks the same aggregation over and over, for example 'total sales by month and category', you can materialise that result with an indexed view. The optimiser serves it from the view instead of scanning the base tables every time.

sql
SET NUMERIC_ROUNDABORT OFF;
SET ANSI_PADDING, ANSI_WARNINGS, CONCAT_NULL_YIELDS_NULL,
    ARITHABORT, QUOTED_IDENTIFIER, ANSI_NULLS ON;
GO CREATE VIEW dbo.vw_MonthlySalesByCategory
WITH SCHEMABINDING
AS
SELECT
    YEAR(s.SalesDate) AS SalesYear,
    MONTH(s.SalesDate) AS SalesMonth,
    p.ProductCategory,
    SUM(s.SalesAmount) AS TotalSales,
    COUNT_BIG(*) AS RowCount
FROM dbo.Sales AS s
INNER JOIN dbo.Product AS p ON s.ProductID = p.ProductID
GROUP BY
    YEAR(s.SalesDate),
    MONTH(s.SalesDate),
    p.ProductCategory;
GO CREATE UNIQUE CLUSTERED INDEX IX_vw_MonthlySalesByCategory
ON dbo.vw_MonthlySalesByCategory (SalesYear, SalesMonth, ProductCategory);
GO

The restrictions are real: SCHEMABINDING is mandatory, no OUTER JOINs, you must include COUNT_BIG(*) when grouping, deterministic functions only. But the payoff is enormously worth it. I've seen queries go from 150,000 logical reads to 200.

One thing to watch out for is that automatic view matching only works on Enterprise Edition and Azure SQL Database. On Standard Edition, the optimiser won't route queries to an indexed view unless the query includes the NOEXPAND hint, and Power BI can't inject hints into its generated SQL. So on Standard Edition on-prem, indexed views for DirectQuery are a no go unless you can get Power BI to query the view directly as a table source in the model.

Resource Governor: protect your OLTP workload

If DirectQuery and transactional workloads share the same instance, Resource Governor lets you put Power BI in its own lane. One caveat: on SQL Server 2019 and 2022 this requires Enterprise Edition. Azure SQL Database handles resource governance differently through its own service tier limits.

sql
CREATE RESOURCE POOL Pool_PowerBI
WITH (MAX_CPU_PERCENT = 40, MAX_MEMORY_PERCENT = 30);
GO CREATE WORKLOAD GROUP Group_PowerBI
WITH (MAX_DOP = 4)
USING Pool_PowerBI;
GO CREATE FUNCTION dbo.ClassifyPowerBI()
RETURNS SYSNAME WITH SCHEMABINDING
AS BEGIN
    DECLARE @group SYSNAME = 'default';     IF APP_NAME() LIKE N'ower BI'
       OR SUSER_SNAME() = N'PowerBIServiceAccount'
        SET @group = 'Group_PowerBI';     RETURN @group;
END;
GO ALTER RESOURCE GOVERNOR WITH (CLASSIFIER_FUNCTION = dbo.ClassifyPowerBI);
ALTER RESOURCE GOVERNOR RECONFIGURE;

The MAX_DOP = 4 at the workload group level is the key detail here. Parallelism makes individual queries faster but burns worker threads. When 50 users hit slicers and each query goes parallel at DOP 8, you run out of threads fast. Capping it at 4 keeps queries reasonably quick while leaving headroom for concurrency. If you're seeing thread starvation, high parallelism waits (CXPACKET, CXCONSUMER), and runnable queues spiking during report bursts, capping MAXDOP for the Power BI workload group is where I'd start.

Build a reporting layer with views

Here's one that costs nothing and pays off immediately: create views that pre-join your dimensions to your fact tables, and point Power BI at those views instead of the raw tables.

sql
CREATE VIEW dbo.vw_SalesReporting
AS
SELECT
    s.SalesID, s.SalesDate,
    YEAR(s.SalesDate) AS SalesYear,
    MONTH(s.SalesDate) AS SalesMonth,
    c.CustomerName, c.CustomerSegment, c.Region,
    p.ProductName, p.ProductCategory,
    s.Quantity, s.SalesAmount, s.Cost,
    s.SalesAmount - s.Cost AS GrossProfit
FROM dbo.Sales s
INNER JOIN dbo.Customer c ON s.CustomerID = c.CustomerID
INNER JOIN dbo.Product p ON s.ProductID = p.ProductID;

Power BI connects to vw_SalesReporting as a single table source. That means no more Power BI-generated joins between Sales, Customer, and Product on every visual query. The view handles the joins once in a predictable way, your indexes stay in play, and the generated SQL gets simpler. I've seen this alone cut execution times by 30-500n complex models with many dimension tables.

One caveat: DirectQuery requires query folding, and stored procedures and complex SQL constructs like CTEs don't fold. Views are your only real option for a server-side reporting layer, but they're all you need.

What to tell your BI team

You can tune the database all day to be honest but the biggest gains often come from the Power BI side. Here's what I send to BI developers when I'm working on a DirectQuery problem. Five changes that typically cut SQL Server load in half:

  1. Keep visuals under 8 per page. Each visual is a query. Going from 15 to 8 visuals cuts your query volume by nearly half per interaction. This is the single easiest win.
  2. Turn on Apply buttons for slicers. Without them, every intermediate slicer click fires a full round of queries. With Apply buttons, users set their filters and submit once.
  3. Disable cross-filtering between visuals that don't need it. Go to Format > Edit Interactions and set unrelated visuals to "None." This stops one chart click from refreshing everything else on the page.
  4. Use composite models: import the dimensions, DirectQuery the facts. Import your Customer, Product, and Date tables into the Power BI model - they're small and change slowly. Keep only the large fact tables on DirectQuery. Dimension lookups no longer hit the database at all.
  5. Set Assume Referential Integrity on relationships where FKs are clean. Changes LEFT OUTER JOINs to INNER JOINs in the generated SQL. Better join strategies, faster queries, no database changes needed.

If you're on Azure SQL

Three things work differently on Azure SQL Database, and they're all in your favour.

Read replicas

Azure SQL Database supports read scale-out in the Premium, Business Critical, and Hyperscale tiers. When it's available, add ApplicationIntent=ReadOnly to the Power BI connection string and the entire DirectQuery workload routes to a readable secondary replica. Your primary stays clean for writes. This is the single best move for mixed workloads on Azure.

Automatic tuning

Enable FORCE_LAST_GOOD_PLAN:

sql
ALTER DATABASE CURRENT
SET AUTOMATIC_TUNING (FORCE_LAST_GOOD_PLAN = ON);

When a DirectQuery query regresses after a statistics update or plan change, Azure automatically detects it and forces the previous good plan. It's not a long-term fix, but it stops the bleeding while you investigate properly.

vCore over DTU

DTU bundles compute, memory, and IO into a single number, which makes it hard to tune for analytical workloads. vCore lets you scale CPU and memory independently. DirectQuery needs CPU for parallelism and memory for hash joins and sorts - vCore gives you that control.

When to honestly give up on DirectQuery

I'd be doing you a disservice if I didn't say this. Sometimes DirectQuery really is the wrong tool, and no amount of indexing will fix that.

If your dataset is under 100 GB uncompressed, use Import mode. VertiPaq's compression is remarkable. As an example, a 5 GB SQL table often compresses to 500 MB in Power BI. Querying from RAM will always beat querying over a connection to a relational engine. Import also gives you the full DAX feature set, which DirectQuery restricts.

If you need data freshness better than 30 minutes but the data isn't enormous, try incremental refresh. I've seen teams adopt DirectQuery because they thought Import couldn't refresh fast enough. In practice, incremental refresh (where only recent partitions update) can bring refresh times from hours down to minutes.

If DirectQuery is genuinely necessary - multi-terabyte datasets, regulatory requirements to keep data at source, proper real-time freshness needs - then **composite models** are the middle ground. Import your dimensions, keep facts on DirectQuery, and add aggregation tables for the most common queries. Power BI Premium and Fabric support user-defined aggregations that route high-level queries to a cached table and only hit your SQL Server for drill-down detail.

If your organisation is moving to Microsoft Fabric, Direct Lake mode is worth evaluating. It reads directly from Delta/Parquet files in OneLake and delivers something close to Import performance without the scheduled refresh overhead. It's still maturing, but it's where Microsoft is pointing the roadmap.

The honest conversation with stakeholders usually starts with the question: "What's your actual data freshness requirement?" Most teams discover that "real-time" means "within the hour" once they think it through. If that's the case, Import with scheduled refresh covers it, and you skip the entire DirectQuery tuning exercise.

Getting DirectQuery performance right is a shared problem between the DBA and the BI team, and neither side can fix it alone. The changes that stick are the ones where both sides move: columnstore indexes and Resource Governor on the database, fewer visuals and composite models on the Power BI side.

If you do three things after reading this, make them these. First, add a nonclustered columnstore index to your largest fact table, then set up a Query Store query to check for DirectQuery patterns weekly, and finally send your BI team the five-item list above.

Getting this balance right does require [someone who's comfortable with both SQL Server tuning and Power BI modelling](https://redeagle.tech/services/power-bi-consultant). But the fundamentals are all here. Start with the diagnostics, follow the data, and let Query Store guide your indexing decisions.

The SQL that Power BI generates will never look like the SQL you'd write by hand. But with the right indexes, the right configuration, and a BI team that understands the cost of their design choices, DirectQuery can work well. It just needs a DBA who knows what to expect.

 

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating