Problems displaying this newsletter? View online.
Database Weekly
The Complete Weekly Roundup of SQL Server News by SQLServerCentral.com
Hand-picked content to sharpen your professional edge
Editorial
 

Machine Learning and Data Analysis Bias Challenges

One of the challenges of doing machine learning is getting lots of data. One of the other challenges of machine learning is having a lot of data.

This is a bit of a dichotomy with the data. A data scientist or machine learning hacker needs a lot of data to find accuracy in their model predicting something. The challenge can be with all the features (or columns in a data set) that represent potential items that influence the decision. Which of these do you choose?

I saw a post about looking over data and analyzing loans for risk. The post looks at how you might use a notebook to perform some analysis and does a good job of walking through the process with Databricks. It then builds, trains, and tests a model in the classic machine learning sense. The data is a public dataset from Lending Club, with actual loan data.

The post doesn't show the complete data set, and I didn't download it. I hope there isn't any PII data in here, but I'm sure that many people that actually analyze loans would have data related to a person's address, gender, and other PII type data. That makes sense for loan companies that need to create a financial contract with an individual, but it also allows humans to use their bias (intentionally or unintentionally) to affect decisions.

One of the promises of machine learning and AI is that this bias won't be present because the machines don't have feelings about individuals. Except, the data can reflect past feelings. If the data tends to show that people in a low-income neighborhood don't get loans, then the machine might pick this as a feature to look for in future data sets. Same for address or any other number of variables. Certainly, the person training the model may account for this and supervise the training, but I worry about this. Far too many people trust computers to do the right thing, but they are only as good as the data.

Garbage in, garbage out. A common phrase many of us have used, and it's applicable here. Often our data sets reflect the frailty and mistakes of past human decisions and may not be the best choices for how to train machines that will assist us in the future. They are useful, but we ought to be aware and cognizant of the bias that might exist in our data.

I do think machine learning and artificial intelligence are useful in our world. They improve our lives, and they can help reduce the common mistakes that humans can make and improve our lives. They also can be flawed, and we ought to be careful about how much we rely on them. Certainly the early deployment of any type of model should be controlled, limited, and carefully monitored. Whether this is for a model driving a car, approving loans, or identifying pictures. All your training and testing will likely be with limited data, so be wary when releasing these items into the wider world.

Steve Jones - SSC Editor

Join the debate, and respond to the editorial on the forums

 
The Weekly News
All the headlines and interesting SQL Server information that we've collected over the past week, and sometimes even a few repeats if we think they fit.
Vendors/3rd Party Products

DBAs at work #1: The IT services provider DBA

Our series of ‘DBAs at work’ blog posts feature conversations with IT leaders and experts about the challenges of managing and monitoring their server estates. Episode 1 features Dennis Heitmann, Database Administrator at Atruvia, which provides IT services for banking clients in Germany.

Tracking SQL Server Configuration Across All Your Servers Using SQL Monitor

The Estate Configuration reports SQL Server configuration across all your servers, so you can quickly investigate ad-hoc or unauthorized changes to any settings that might affect their performance, stability, or security.

A Flyway Teams Callback Script for Auditing SQL Migrations

Demonstrates a cross-database PowerShell callback script for reporting on and auditing Flyway migrations, telling you which scripts were used to create each version, when they were run, who ran them and more.

AI/Machine Learning/Cognitive Services

Deep Reinforcement Learning : Applying Visual Attention in Minecraft

From tsmatz

Attention is one of most successful deep learning network architecture in state-of-the-art NLP works in AI, and attention can also be applied in visual recognition. In this post, I'll show...

What’s the Real Value of AIOps?

From Past News - RSS Feeds

In its simplest form, AIOps (Artificial Intelligent Operations) is a marriage of AI technology that’s been around for decades. AIOps arrived at its current incarnation from two different directions. ...

Machine Learning in Loan Risk Analysis

From BlueGranite Blog

While Finance may be a complex and multifaceted industry, the core goal of any financial institute is very straightforward: to detect and mitigate risks while maximizing profit. This objective...

Administration of SQL Server

No Running at the Resource Pool

Ok, I get it, scheduling queries can be complicated. See this and this and maybe this and this too but only if you have insomnia. I still thought I kinda understood it. Then I started seeing hundreds of query timeouts on a quiet server, where the queries were waiting on…what?

How the DAC saved the day

I got a call this morning from a coworker. One of the database instances was unreachable with the message that the TempDB log file was full. No processes could login and the only way to get things going again might be to restart either server or instance.

TSQL To Show All Merge Replication Articles

From Steve Stedman

At Stedman Solutions, we do a lot of work with SQL...

No Running at the Resource Pool

From Forrest Shares Stuff

Ok, I get it, scheduling queries can be complicate...

SQL Elevated Configuration: The fail-safe for maintenance

From Simple Talk

Many years ago, in the company I was working for, one junior DBA started a reindex operation in a SQL Server Standard Edition on the most busy day of...

Azure SQL Database

Copy Data tool to import data into Azure SQL Database from web sources

From SQLShack

This article will explore the Copy Data tool for importing data into Azure SQL Database from a web source. Introduction Suppose you need to import data into Azure database...

Getting started with Azure SQL

From SQLShack

This article will help you understand Azure SQL and its deployment options available in the Azure cloud. This article would be helpful to you if you are planning to...

Azure SQL Managed Instance

Azure SQL Managed Instance – Premium Tier

From SQLServerCentral Blogs

In case you are not aware Microsoft have now deployed a new change to SQL Managed Instances within the tier types. In certain regions ( shown later) you can... The...

Migrate SQL Server on Linux using the Azure SQL Migration extension for Azure Data Studio

From Azure SQL

This blog is authored by Kevin Barlett (Senior Customer Engineer, Customer Success Unit) and reviewed by Mohamed Kabiruddin (Senior Program Manager, Azure SQL)   In this article, we'll detail the process...

Azure Synapse (SQL Data Warehouse and Data Lake)

How to query your Delta Lake with Azure Synapse SQL pool

Querying Delta Lake files using T-SQL in Azure Synapse Analytics is now generally available. In this blog post, you will learn how to use serverless SQL pool and T-SQL language to analyze your Delta Lake files from your Azure Synapse workspace.

Ingestion and Processing Layers in Azure Data Lakehouse

From MSSQL Tips

In this article learn about various options for in...

“Serverless” Lessons Learned

I’ve architected and currently implementing a solution that uses Synapse (my last newsletter has the details, plus the architecture diagram). Synapse Serverless is the Microsoft answer to Amazon Athena but instead of using open-source tools like Presto, it’s built on SQL Server. In this project we extract many tables from 1,500 on-prem SQL Server databases and stage them in ADLS.

Database templates in Azure Synapse Analytics

Azure Synapse Analytics is a limitless analytics service that brings together data integration, enterprise data warehousing, and big data analytics. It gives you the freedom to query data on your terms, using either serverless or dedicated resources—at scale. Azure Synapse brings these worlds together with a unified experience to ingest, explore, prepare, manage, and serve data for immediate BI and machine learning needs.

Career Growth and Certifications

There is a Ton of Free SQL Server Training Coming Up.

From Brent Ozar Unlimited

If you don’t get smarter soon, it’s your own damn fault. Oh sure, I’m running a Black Friday sale, but even if you’re as broke as SSMS’s live query...

Computing in the Cloud (Azure, Google, AWS)

Azure Charts

From 36 Chambers – The Legendary Journeys

Azure Charts is a site that a lot of people should know about but don’t. Alexey Polkovnikov came up with and developed the site, and I highly recommend checking...

Setting up Azure Network Gateway Logging

From DCAC

If you’ve ever set up an Azure Network Gateway for Site to Site or Person to Site VPNing you’ve probably wanted to be able to see logging from the...

Conferences, Classes, Events, and Webinars

Enjoy the PASS Data Community Summit online this year – and prepare for a real-life hybrid experience next year

From Blog – Redgate Software

The PASS Data Community Summit 2021 is with us online this week, November 8 – 12, with over 300 speakers delivering over 300 sessions to, literally, thousands and thousands...

My Top 4 sessions at PASS Data Community Summit 2021

From SQLServer-DBA.Com

Attending PASS Data Community SUMMIT 2021

Notes on SQLSaturday Orlando 2021

From SQLServerCentral Blogs

We held an in-person SQLSaturday here in Orlando last weekend (Oct 30th). We didn’t organize one last year, there was just too much risk and too much uncertainty, so... The...

Pass Summit 2021: A quick thank you to @Redgate.

From SQLStudies

I’d just like to say a quick thank you to Redgate for hosting the Pass Summit, all of the amazing ... Continue reading

Upcoming Events: DEVintersection

From 36 Chambers – The Legendary Journeys

Key Details What: SQL Server & Azure SQL Conference at DEVintersection.Where: Las Vegas, NV.When: Sunday, December 5th through Friday, December 10th.Admission is not free. Registration is available from the conference...

Free Ransomware Awareness Class

From Steve Stedman

With so many recent contacts from possible clients who have been hit by ransomware, I have seen too many companies completely lose their entire SQL Server databases, their backups,...

What we learned at PASS Data Community Summit

Join Steve Jones, Kathi Kellenberger and Grant Fritchey as they each reveal their highlights, learnings and key takeaways from their 2021 Summit experience.

DMO/SMO/Powershell

Creating a PowerShell Clock

From The lonely Administrator

I’ve published a new project to the PowerShell G...

#PowershellBasics: The WHERE clause

From SQLStudies

Sometimes when you are trying to figure something ...

Data Mining / Data Analysis

Effective sample size for Markov Chain Monte Carlo

From Statistical Odds & Ends

In this previous post, we introduced the notion of...

What do we mean by effective sample size?

From Statistical Odds & Ends

Let’s assume that we have some distribution we want to estimate some quantity related to it (e.g. the mean of the distribution). A typical estimation strategy is to draw...

How heavy-tailed is the t distribution?

From Statistical Odds & Ends

It’s well-known that the distribution has heavie...

How heavy-tailed is the t distribution? (Part 2)

From Statistical Odds & Ends

In this previous post, we explored how heavy-tailed the distribution is through the question: “What is the probability that the random variable is at least x standard deviations (SDs)...

Data Privacy, Compliance, and GDPR

T-SQL Tuesday #144 – Data Governance

From Deb the DBA

Happy T-SQL Tuesday! I’m really excited for this...

T-SQL Tuesday #144–Data Governance

From SQLServerCentral Blogs

This month’s topic is something I wouldn’t hav...

Deriving Business Value Through Data Governance

From Dataversity

Growing companies often find themselves floating on an “ocean” of underutilized or misused data – data that doesn’t reach the people who would most benefit from it or reaches...

Database Design, Theory and Development

Nobody Understands the Relational Model: Semantics, Relational Closure and Database Correctness Part 2

From Database Debunkings

 with David McGoveran  (Title inspired by Richard Feynman) In Part 1 we explained that all database relations are, mathematically, relations, but not all relations are database relations, which are in both...

Views vs. Indexed Views

From Callihan Data

Let’s consider the benefits and drawbacks of SQL views. Querying multiple tables can be easier with views. It’s shorter to write a query against a single view (SELECT *...

DevOps and Continuous Delivery (CI/CD)

Reasons to learn GitHub if using Azure DevOps

From Kevin Chant

Reading Time: 3 minutes In this post I want to cover reasons to learn GitHub if using Azure DevOps. Because I think it is an important topic to...

ETL/SSIS/Azure Data Factory/Biml

How to get the SQL View definition

From SQLServer-DBA.Com

An ETL process requiring SQL Server View definitio...

Transfer Stored Procedures between master databases on SQL Server instances using SSDT 2017

From SQLShack

This is the second article in the series of Migrat...

HA/DR/Always On/Clustering

2021 Ransomware Recovery Tales for IT Professionals

From IT Pro - Microsoft Windows Information, Solutions, Tools

Date: Tuesday, December 07, 2021 Time: 12:00 PM Ea...

Simplify Azure SQL Virtual Machines HA and DR configuration by adopting multi subnet approach

From Azure SQL

SQL Server on Azure Virtual Machines is the best option to migrate your SQL Server workloads maintaining complete SQL Server compatibility and operating system level access. It is ideal...

Hardware

AMD EPYC Milan-X Server Processors

From Glenn Berry

Introduction On November 8th, 2021, AMD held an AM...

MDX/DAX

ConcatenateX in Power BI and DAX: Concatenate Values of a Column

From RADACAD

It happens often in Power BI calculations and reports that you need to concatenate a list of values from a column. You can do this concatenation in Power Query...

CONTAINSSTRING, CONTAINSSTRINGEXACT – DAX Guide

From Sqlbi

CONTAINSSTRING: Returns TRUE if one text string contains another text string. CONTAINSSTRING is not case-sensitive, but it is accent-sensitive. https://dax.guide/containsstring/ CONTAINSSTRINGEXACT: Returns TRUE if one text string contains another...

AND, OR, NOT, TRUE, FALSE – DAX Guide

From SQLBI

AND: Checks whether all arguments are TRUE, and returns TRUE if all arguments are TRUE. https://dax.guide/and/ OR: Returns TRUE if any of the arguments are TRUE, and returns FALSE...

Performance Tuning SQL Server

Performance Issues With EXISTS Queries

From Erik Darling Data

Dos Puntos Look, I really like EXISTS and NOT EXI...

PowerPivot/PowerQuery/PowerBI

Power BI Adding Translations to Rename Columns – XMLA, TOM, C#

From Data on Wheels (Steve Hughes)

If you are new to using C# and the Tabular Object Model (TOM), please check out the previous blog post (https://dataonwheels.wordpress.com/2021/10/15/power-bi-meets-programmability-tom-xmla-and-c/) for both an introduction to the topic and...

I cannot connect to Azure Storage to back up my Power BI Premium Per User/Premium

From FourMoo

What I did learn when working through the blog post is that I ran into some errors when trying to re-connect or trying to connect to the Azure Storage...

How to enable the Single Value option in a Power BI slicer

From Sqlbi

This video describes how to enable the Single Valu...

Review Performance Analyzer in this Power BI Report from Smart Power BI

From Guy in a Cube

You should be reviewing performance of your Power BI Reports. This report from Smart Power BI helps you to visualize the results of Performance Analyzer. Let's check it out!...

Is Power BI’s “Show Data Point As A Table” Feature A Security Hole?

From Chris Webb's BI Blog

Power BI's "Show data point as a table" feature is not a security hole - you need to use row-level security and object-level security to stop users from accessing...

Product Reviews and Articles

Book Reviews: Fighting Churn with Data

From 36 Chambers – The Legendary Journeys

Fighting Churn with Data by Carl Gold is an intere...

(Livestream Replay) NEW FEATURE for Power BI Side Tools - a DAX Template Generator for Your Reports - with Didier Terrien

From Havens Consulting

Store your favorite DAX expressions into repositor...

Product Upgrades and Releases

Unlock additional value in your Microsoft data with updates to Azure Synapse Link

Azure Synapse Link makes it simple for data teams to unlock greater value in their data by eliminating barriers between Microsoft data stores – including both operational data stored in Azure Cosmos DB and business application data stored in Microsoft Dataverse – and the limitless cloud analytics capabilities available in Azure Synapse.

New – EC2 Instances (G5) with NVIDIA A10G Tensor Core GPUs

From AWS News Blog

Two years ago I told you about the then-new G4 ins...

Early technical preview of JDBC Driver 9.5.0 for SQL Server released

From MS SQL Server Blog

We have released a new early technical preview of the JDBC Driver for SQL Server which contains a few additions and changes.   Precompiled binaries are available on GitHub and also on Maven Central.   Idle...

Azure SQL News Update: November 2021

From Azure SQL

Today and every Wednesday Data Exposed goes live at 9AM PT on LearnTV. Every 4 weeks, we’ll do a News Update. We’ll include product updates, videos, blogs, etc. as...

sp_WhoIsActive Version 12 Is Out!

From Erik Darling Data

Get’em Daddy You know, you love it, you often wondered if it would ever get a new version! Someone JUST asked me yesterday if this has been updated recently. Thank you...

Announcement: controlling access to Azure SQL at scale with policies in Purview – now in Private Pre

From Azure SQL

Announcement: Purview policy based access control for Azure SQL at scale – Private Preview   I am excited to present a new way to control access to resources by assigning policies in...

Microsoft Defender for Cloud Supports AWS

From Petri IT Knowledgebase

At this past virtual Ignite 2021 conference, Microsoft announced that their Microsoft Defender for Cloud service will provide native support for AWS. Multi-cloud protection has become a priority for...

Action BI Toolkit

From Sqlbi

Action BI Toolkit provides tools to bring Power BI projects under source control, enabling professional development workflows and enhanced governance. Available for free and with open source code (MIT...

Azure SQL Migration extension - November 2021 updates

From Azure SQL

A little over two months ago we announced public preview for the Azure SQL Migration extension for Azure Data Studio to ease your database migrations from SQL Server to...

Python

Create Performance Charts in Python for Time Series Data within SQL Server

From MSSQL Tips

Learn how to create performance charts for time series data using SQL Server and Python.

R Language

Should I Move to a Database?

Long ago at a real-life meetup (remember those?), I received a t-shirt which said: “biggeR than R”. I think it was by microsoft, who develop a special version of R with automatic parallel work. Anyways, I was thinking about bigness (is that a word? it is now!) of your data. Is your data becoming to big?

SQL Server News

SQL Server 2022 and Big Releases

From Curated SQL

Brent Ozar opines on an interesting topic: The que...

SQL Server 2022

From SQLServerCentral Blogs

yes SQL Server 2022 is coming and would have many advance features. https://www.microsoft.com/en-us/sql-server/sql-server-2022 whats new in SQL Server 2022 would be know more here: https://techcommunity.microsoft.com/t5/microsoft-mechanics-blog/what-s-new-in-sql-server-2022/ba-p/2922227 Bob Ward would be... The...

SQL Server Security and Auditing

Ransomware: A world under threat

Ransomware has threatened many organizations over the past few years. In this article, Robert Sheldon explains the history of ransomware and what needs to be done to protect against it.

Security News and Issues

SolarWinds Vulnerability Exploited in First Stage of Clop Ransomware Attacks

From Dark Reading: Dark Reading News Analysis

Russian cybercrime group known as T505 is targeting SolarWinds Server-U systems that haven't been patched for a remote code execution vulnerability fixed this summer.

ChaosDB: Researchers Share Technical Details of Azure Flaw

From Dark Reading: Dark Reading News Analysis

Wiz researchers who discovered a severe flaw in the Azure Cosmos DB database discussed the full extent of the vulnerability at Black Hat Europe.

Advice for Personal Digital Security

From Schneier on Security

ArsTechnica’s Sean Gallagher has a two–part article on “securing your digital life.” It’s pretty good.

US Defense Contractor Discloses Data Breach

From Dark Reading: Dark Reading News Analysis

Electronic Warfare Associates says an attackers in...

Kaspersky Finds DDoS Attacks in Q3 Grow by 24%, Become More Sophisticated

From Dark Reading: Dark Reading News Analysis

The total number of smart attacks (advanced DDoS attacks that are often targeted) increased by 31% when compared to the same period last year.

Banking Malware Threats Surging as Mobile Banking Increases – Nokia Threat Intelligence Report

From Dark Reading: Dark Reading News Analysis

The Nokia 2021 Threat Intelligence Report announced today shows that banking malware threats are sharply increasing as cyber criminals target the rising popularity of mobile banking on smartphones, with...

T-SQL

SQL TOP statement performance tips

From SQLShack

In this article, we will discuss the performance d...

Telling data to act like other data

From Sherpa of Data

When messing with data, sometimes you want to chan...

Fundamentals of Table Expressions, Part 13 – Inline Table-Valued Functions, Continued

From SQLPerformance

Itzik Ben-Gan concludes his series on table expressions in SQL Server, explaining more internals of inline table-valued functions. The post Fundamentals of Table Expressions, Part 13 – Inline Table-Valued Functions,...

Table Valued Parameters and Dapper in .NET Core

From Born SQL

A customer I’ve been working with for a while now has a monolithic ASP.NET MVC web application which we are porting to .NET Core 3.1 (and then almost immediately...

TRY_CAST And TRY_CONVERT Can Still Throw Errors

From Erik Darling Data

It Was Written I was a bit surprised by this, because I thought the whole point of these new functions was to avoid errors like this. SELECT oops...

Different Ways to Format Currency Output in SQL

From MSSQL Tips

In this article we look at different ways to format currency output in SQL Server and most notably the different options the FORMAT function provides.

Trusting STRING_SPLIT() order in Azure SQL Database

From SQLBlog.org

See the new overloaded STRING_SPLIT() function with its enable_ordinal parameter, now available in an Azure SQL Database near you.

Ordered String Splitting in SQL Server with OPENJSON

From MSSQL Tips

This article covers various ways to split strings in SQL Server along with using OPENJSON and also a performance comparison of different approaches.

Searching multi-lingual data in Azure SQL Databases

From SQLShack

In this article, we will learn how to enable multi-lingual search on data hosted in Azure SQL Databases. Introduction Azure SQL Database is one of the most popular relational...

"I want to do X to all the Ys in database Z" – Part 1

From SQLBlog.org

I show one way to run arbitrary SQL against objects in an arbitrary database - using nested dynamic SQL.

The Right Way To Check For NULLs In SQL Server Queries

From Erik Darling Data

101ers This is still one of the most common problems I see in queries. People are terrified of NULLs. People are afraid to merge on freeways in Los Angeles. What results is...

Ordered string splitting in SQL Server with OPENJSON

From SQLBlog.org

Every time I write a post about splitting strings, I promise myself it's the last one. I need to stop making that promise. In this tip, I show how...

Tools for Dev (SSMS, ADS, VS, etc.)

Parameterized SQL Notebooks in Azure Data Studio

From SQLShack

This article will explore Parameterized SQL notebooks in Azure Data Studio. Introduction SQL Notebook or the Jupyter notebook in the Azure Data Studio has excellent capabilities that include codes...

 
RSS FeedTwitter
This email has been sent to {email}. To be removed from this list, please click here. If you have any problems leaving the list, please contact the webmaster@sqlservercentral.com. This newsletter was sent to you because you signed up at SQLServerCentral.com. Note: This is not the SQLServerCentral.com daily newsletter list, and unsubscribing to this newsletter will not stop you receiving the SQL Server Central daily newsletters. If you want to be removed from that list, you can follow the instructions on the daily newsletter.
©2019 Redgate Software Ltd, Newnham House, Cambridge Business Park, Cambridge, CB4 0WZ, United Kingdom. All rights reserved.
webmaster@sqlservercentral.com

 

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -