Problems displaying this newsletter? View online.
Database Weekly
The Complete Weekly Roundup of SQL Server News
Hand-picked content to sharpen your professional edge
Editorial
 

Representing uncertainty in difficult times

As a data person, I'm appalled at the errors made by many journalists in reporting the spread of the Covid-19 pandemic. I'm aghast at the way that they can publish fantastic nonsense that would have caused apoplexy to any peer reviewer in a scientific journal. However, what worries me the most is how uncertain data, initially provided with caveats, is eventually rendered to the public as certainty. It brings me to a more general point. How should we, as responsible data people, represent uncertainty in data?

Obviously, we need to track the course of any pandemic as accurately as possible. To do this we need whatever figures are available. Once a figure is recorded, however, it takes on an undeserved aura of truth. It could have been a rough estimate, or a best effort in the light of a blaze of contradictory evidence. Most sources of the data for the incidence of covid-19 are collated and published with caveats and warnings for the unwary. They are intended for a scientific audience that understands the constraints and uncertainties. When this data reaches the press, the temptation is to compare them in different regions or countries, and use the results to further almost any political cause; a temptation that is seldom resisted.

The data was usually published as a daily accumulation, from zero, to the total. When figures are uncertain, this makes sense because, if you find, in retrospect, that certain facts weren't known at the time, you merely bump the total. Easy: but if this happens, the intermediate figures are invalidated. Why wouldn't deaths, for example, be certain? The major problem is attributing the cause of death. It might seem obvious, but the great majority of these deaths occur with people having other severe underlying medical conditions, or who are in general frail health. This results in a huge variation in reporting: Belgium, for example, counts all coronavirus deaths outside hospitals in its daily statistics: deaths in care homes account for 53% of the total. Belgium's official toll also includes people suspected of having died of coronavirus, without a confirmed diagnosis. Other countries count only confirmed diagnoses, even though a post-mortem could be delayed by days. Deaths in care homes tend to be under-reported in Europe, and ignored in other countries, sometimes for political or economic reasons. What about the figures for 'confirmed cases'? A minefield according to the EU's European Centre for Disease Prevention and Control.

If reported deaths are so unreliable, why not just track the course of the pandemic via the excess mortality over and above the normal? The problem is accuracy. The normal variance in the death rate is too great to be useful. The scale of deaths due to the Covid-19 pandemic are within the normal variance. Certainty is elusive.

How should one represent uncertainty? We used to have a clever convention where if the data was "sketchy", the graph was in sketch form, indicating the uncertainty. Some graphing packages still allow this. We should, of course, report the variance of normally-distributed data, but data like this has a high level of bias and inherent noise. This needs to be represented, because decisions made based on bad data are likely to be bad decisions.

Phil Factor

Join the debate, and respond to the editorial on the forums

 
The Weekly News
All the headlines and interesting SQL Server information that we've collected over the past week, and sometimes even a few repeats if we think they fit.
Vendors/3rd Party Products

Six reasons to version control your database

For most application developers, it’s unthinkable to work without version control. So why is the database any different? This blog lays out 6 clear benefits to applying version control best practice to your database alongside your application.

SQL Change Automation enables a new flexible approach to DevOps

In the latest version of SQL Change Automation, we enable you to be more flexible in your approach to Database DevOps and combine elements of both state- and migrations- based approaches. Learn more about the new workflow.

Using SQL Clone and SQL Change Automation to switch database branches

Learn how to use SQL Clone together with Git hooks and SQL Change Automation to branch your database in Git as easily as your code. By enabling multiple copies of the same database to exist on a single instance, the integration of these tools allows seamless database branch switching.

Redgate's Community Circle offers free resources and more

Develop your skills, and get support and advice from peers and members of the Redgate Advocate team.

AI/Machine Learning/Cognitive Services

ZeroR: The Simplest Possible Classifier… or: Why High Accuracy can be Misleading

In this post, I create the simplest possible classifier, called ZeroR, to show that even this classifier can achieve surprisingly high values for accuracy (i.e. the ratio of correctly predicted instances)… and why this is not necessarily a good thing.

Administration

Checking for SQL Server Updates with dbatools

What if I have a sprawling SQL Server estate and other folks want a report on the current state of patching?

Stop trying to shrink tempdb files

The harm caused by attempting to shrink tempdb files.

Analysis Services / BI on the MS Stack

Implementing a Data Quality Monitoring Framework

This article from Imran Quadri Syed walks the through steps involved to implement a successful Data Quality Monitoring framework that would validate incoming data against predefined data quality rules and capture data quality metrics.

Azure DevOps

Azure DevOps Agent Job Names

How to give Agent jobs, such as to build a database, meaningful names in Azure DevOps.

Azure SQL Database

Troubleshooting Performance in Azure SQL Database

Azure SQL Database provides the Performance overview and Query Performance insight pages, but if you want to use tools you’re familiar with, like wait statistics, DMVs and so on, you can; they just need a few tweaks.

Updated Azure SQL Database Tier Options

Azure SQL Database is a feature rich database platform that offers a wide range of options for compute and scale. Tim Radney explains the different tiers.

Azure SQL Managed Instance

Accessing Managed Instance via SSMS

Use SSMS without using a VNET by enabling a public endpoint for your managed instance.

Conferences, Classes, Events, and Webinars

Extending SQL Monitor using Extended Events

Extended Events open up the whole world of special metrics and alerting available to you within SQL Monitor. Join Microsoft MVP Grant Fritchey to discover how to extend what’s possible.

Database DevOps: the benefits for your organization

Redgate's Chris Unwin and Chris Kerswell host Coeo’s CEO Justin Langford and Principal Consultant, Andy Jones to talk database DevOps and the benefits to your organization.

New free Data Platform Courses from MVPs

Redgate bring you new free weekly courses from the Community circle presented by Data Platform MVPs.

Data Visualisation

Alternatives to diagonal axis labels

In this short post, I’ll highlight two common scenarios that lead to diagonal x-axis labels—long category names on bar charts and long date labels on line graphs—and a couple ideas to try instead.

Asked and Answered: Visualizing Ranking Data

Figuring out how you want to analyze and report rank data can be tricky. Will you tally up which choices earned respondent’s #1 rank? Top 3? Will you weight the choices in some way?

Database Design, Theory and Development

Creating Concept Models

In concept modeling you will be defining your business vocabulary. It’s fundamental to your business knowledge blueprint.

ETL/SSIS/Azure Data Factory/Biml

Scheduling SSIS packages to run in Azure

Migrating SSIS Packages in Azure can be a complex task. SSMS 18 comes with new features for helping us to start configuring SSIS packages to run in Azure.

Azure Data Factory trigger tip

When you’re working in ADF, you can schedule the pipeline to run on a schedule and you have the option to create a new trigger or select an existing one, but they have meaningless names, so how do you know which one you want? Daniel Janik fixes this.

HA/DR/Always On/Clustering

Always On Availability Groups in Standard Edition

When first introduced the Availability Group feature was Enterprise only. However SQL Server 2016 introduced a basic version of it to Standard edition, and there have been a few enhancements since then.

Hardware

Storage 101: Modern Storage Technologies

Robert Sheldon discuses emerging trends in storage like virtual SANs, intelligent storage, computational storage and storage-class memory.

Performance Tuning SQL Server

Sometimes that Extra Index is the Right One

Depending on the size of the table, and the query you’re tuning, sometimes it’s reasonable to have a non-clustered index on a table with the same key as the table’s clustered index.

"0 to 60" : Switching to indirect checkpoints

Aaron Bertrand creates an extended events session to track checkpoint duration for databases that may benefit from changing to indirect checkpoints

How Overindexing Can Hurt Transaction Logging [video]

A 'hidden cost' of overindexing, which can impact transaction log throughput.

PowerPivot/PowerQuery/PowerBI

Calling The Power BI Export API From Power Automate, Part 2: Creating A Flow That Exports A Paginated Report To A CSV File

How to use a Power Automate custom connector in a flow.

Case Insensitive Merges In Power Query

Power Query is notorious for being case sensitive. Even its language is case sensitive. Often though you get data from users where they are using different cases for the same data.

COVID-19 Three Day Change Report

Paul Turley provides a few Power BI reports on the spread of Covid-19, showing all countries with the percent of change in non-recovered cases in the past three days.

Build a P&L With Power BI

Power BI formulas to create a "Profit and Loss" statements

Using SQL scalar value functions in PowerQuery

The magic of PowerQuery enables us to reuse scalar value functions within PowerQuery, and Query Folding is supported.

Applying color banding by document number in Power BI

This article describes how to use conditional formatting with a DAX expression to color the rows of a table in Power BI based on the order number instead of using the alternate rows color formatting option.

R Language

Essential list of useful R packages for data scientists

All the most needed packages for data science, statistical usage and every-day usage with R.

T-SQL

How to Get Better Estimates for Modification Queries

When you’re doing DUI operations against tables with millions of rows, you have to be really careful about SQL Server’s estimates.

Values blocks

Values blocks are a really useful little bit of code in SQL Server. Basically, they are a block of defined values that you can use pretty much like any other data set.

How does the CHOOSE command affect performance?

Grant Fritchey runs a few tests to find out...

Are Stored Procedure Parameters always cached?

It seems that SQL Server only caches Parameters when it does something non-trivial with them.

Right-Sizing Row Mode Query Memory Requirements

When the optimizer doesn’t estimate the correct amount of memory for a query, either memory is wasted that could be used for other processes or some operations will spill to disk. Microsoft has added Memory Grant Feedback to help overcome this issue. In this article, Greg Larsen explains what you need to know about this new feature.

If You Have Foreign Keys, Don’t Update Fields That Aren’t Changing.

If you update a row without actually changing its contents, does it still hurt?

Can You Use NVARCHAR As a Universal Parameter? Almost.

“What if we just used NVARCHAR(4000) as the default datatype parameter for any query, regardless of what datatype the table has in it – like numbers or dates?” It actually works pretty well in most situations, believe it or not.

Learner’s Guide to SQL Server Query Tuning

Kendra Little tackles Query Tuning, why we need to do it, who needs to do it, the tools we need and more.

Query Store capture note: All does not mean what you think it means

On using Query Store to track usage of queries over time.

Virtualization and Containers/Kubernetes

Running SQL Server on my Mac with Docker

When I shift working locations I work on my Macbook, and I still need access to a SQL Server instance. Docker is a great solution to allow me to run SQL Server containers on my Macbook without having to set up anything.

 
RSS FeedTwitter
This email has been sent to {email}. To be removed from this list, please click here. If you have any problems leaving the list, please contact the webmaster@sqlservercentral.com. This newsletter was sent to you because you signed up at SQLServerCentral.com. Note: This is not the SQLServerCentral.com daily newsletter list, and unsubscribing to this newsletter will not stop you receiving the SQL Server Central daily newsletters. If you want to be removed from that list, you can follow the instructions on the daily newsletter.
©2019 Redgate Software Ltd, Newnham House, Cambridge Business Park, Cambridge, CB4 0WZ, United Kingdom. All rights reserved.
webmaster@sqlservercentral.com

 

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -