RE: Managing large table in SQL Server 2014

One Orange Chip

Points: 25560

March 19, 2018 at 10:07 pm

#1983876

Sergiy - Monday, March 19, 2018 7:55 PM
frederico_fonseca - Monday, March 19, 2018 7:07 PM
When loading the data onto Powerbi split the datetime fields onto date portion and time portion as separate columns.
The big int fields should also, where possible, be split
e.g. a number
123,123,123,122 could be split onto 3
Never a good idea.
If a package assumes it as a good practice the best approach would be not to use it.
can you write a query to select all queries started during off-hours last night?
And what about a query selecting all events with writes > certain amount?
With split columns both queries would require table scans, which means reading the whole table into memory and apply filtering in there.
Which is exactly the problem with the current state of affairs in OP's database.

I think his point here is that PowerBI is going to scan the entire table anyways and it's been the best practice to split the fields up in order to decrease cardinality of a given field. Thus, in a field that is date/time down to the microsecond, the recommendation is to split date into one field, time into another field, then consider removing some of the precision in your time to maybe report to the minute in order to decrease that cardinality. The more uniques the slower the performance.

Overall, you can try direct query with Power BI and see how it's essentially querying the database. While it will generate SQL queries that maybe can take advantage of the model, indexes and so forth, it's also not very smart. In most cases I've seen, there is still queries being kicked out that pull in certain fields entirely, then filters them later when in PowerBI. In my case, it's not too bad because I only use columnstores, but on row-based systems, not so much.

Some of the best options to take here is to obviously consider reducing fields you really don't need. Then moving towards a smaller model where you can optimize the table as much as possible in order to optimize for those full scans. Less bytes, the better essentially. After that, consider maybe rolling up the data or creating views that are pre-filtering the data for the PowerBI reports. Similar to above, less records, the better essentially too.

For example, while PowerBI may still query or import the entire view, the view itself is based on a query that is taking advantage of the indexes on the table. This is commonly not thought of because with SQL Server, if you create a view on a table where an index is on MyField, SELECT MyField FROM MyTable is going to have the same execution plan as SELECT MyField FROM MyTable WHERE MyField = 1. Thus, it's often skipped. But with PowerBI, the filter may not or ever be applied to the view, thus you can do it for PowerBI to help ensure success.

In some of the examples I work with. I mostly do import once a day on a billion record table with those views, but within a columnstore data warehouse. While the import is not exactly blazing fast, once loaded from those views, which are summarizing the data, it's extremely fast in PowerBI as it's pre-loaded and ready to rock-n-roll.