Problems displaying this newsletter? View online.
SQL Server Central
Featured Contents
Question of the Day
Redgate SQL Monitor
The Voice of the DBA
 

The Scientific Method: a call to action

This editorial was originally published on 25 May 2015. It is being re-published as Steve is out of the office.

The scientific method is a key aspect of modern experimental and observational sciences. In academia, presenting a statement as fact without citation or proof would be enough to get a paper thrown out. Maybe it's time we, in IT, started holding ourselves to the same standards.

At its core, the Scientific Method of investigation is as follows:

  1. Define a hypothesis, a theory as to how something works or behaves.
  2. Design experiments to test whether or not the hypothesis is true
  3. Conduct the experiments, capture data and analyze.
  4. Either revise the hypothesis and repeat the process, or conclude on the basis of the data that the hypothesis is either true or false. Proving a hypothesis false, is an equally valid and important result as proving one true. A negative outcome in no way implies "failure" on the part of the hypothesizer.

So why, when we work in a field that is so conducive to experimentation, do so many people stop after step 1? I've lost count of the number articles, blog posts and forum posts I've seen that stated some hypothesis as if it was absolute truth, without doing any testing at all. Personally, I'd like to see a renewed focus on verifying what we say, either by testing it or by referencing someone who already did the testing. In other words, "cite it or prove it".

Sounds grand in theory, but what does it mean in practice? For example, let's say that I intend to write a blog post based on the hypothesis that "Joins on string columns perform the same as joins on integer columns."

The first step, post hypothesis, is to design the experiment. Our test requires two pairs of tables (parent and child), one pair with join columns using the integer data type, one with a string data type. It's important to fix as many variables as possible. That means the same server, the same database version and no other user load (i.e. an isolated Azure VM or a SQL instance on a desktop or laptop). It also means the data types chosen for each pair of tables should be the same size. Integers are 4 bytes, so the equivalent string would be char(4). Ideally, we'd put the same values in both columns. However, that would limit the test data set to 9999 rows, as '10000' doesn't fit into a char(4). To fix that, we'd change the integer column to a bigint and the string column to a char(8) and stick a few million rows into the child table and put the distinct list of the join column into the parent table.

Step 3 is to run the tests. One test is no good. Run each test multiple times to rule out any unexpected influences on the results. The easiest way to use "GO 25" and record the performance characteristics of each test, using Extended Events to capture duration and CPU (reads will be the same if we got the row sizes the same, and if we didn't that's a flaw in the experiment), and then average the results.

Finally, we have the data that will prove the hypothesis true or false. In this case, the results indicate that the hypothesis is false. Joins on string columns are slower than joins on integers and the blog post will show the tests and our conclusions.

Yes, the Scientific Method is a lot more work than making unsubstantiated claims and hitting publish, but it's also a lot more likely to produce a post that's useful, and less likely to attract a pile of comments refuting it.

So, who's in?

Gail Shaw (Guest Editor).

Gail Shaw

Join the debate, and respond to today's editorial on the forums

Redgate SQL Change Automation
 
  Featured Contents

SQL Server Integrates Hadoop and Spark out-of-the box: The Why?

Frank Banin from SQLServerCentral

Introduction Microsoft announced in September 2018 that SQL Server 2019, which is now in preview, will have a Big Data Cluster deployment option. This is a Big-Data-capable SQL Server with elastic scale and extended Artificial Intelligence (AI) capabilities, mostly as a result of deep integration of Hadoop and Spark out-of-the-box.  The new SQL Server Big […]

PowerShell and Secure Strings

Additional Articles from SimpleTalk

Greg Moore demonstrates how to work with the Get-Credential PowerShell cmdlet and secure strings when authenticating to an SFTP server.

Resolving Cross Database Dependencies in SQL Change Automation using Local Databases or Clones

Additional Articles from Redgate

Kathi Kellenberger explains how to avoid build problems, when working with set of inter-dependent databases, by restoring or provisioning copies of all required databases to the development, build, or test instance.

From the SQL Server Central Blogs - Setting Trigger Order

Kenneth.Fisher from SQLStudies

If you can help it you probably shouldn’t be using triggers. But if you have to use a trigger you ... Continue reading

From the SQL Server Central Blogs - How Do You Make DevOps Succceed?

Grant Fritchey from The Scary DBA

I love going to SQLSaturday events because I’m always asked questions that make me think. I was just at SQLSaturday Indianapolis (a great event, if you weren’t there, you...

 

  Question of the Day

Today's question (by Steve Jones - SSC Editor):

 

Configuring Stretch Database

I decide that I want to take one of the databases on my SQL Server 2017 instance and enable it for the stretch database option. What is the first configuration option that needs to be changed?

Think you know the answer? Click here, and find out if you are right.

 

 

  Yesterday's Question of the Day (by Steve Jones - SSC Editor)

Transient Synonyms

On SQL Server 2017, I run this code in a batch to create a temp table:

CREATE TABLE #songs
(   songid      INT,
    title       VARCHAR(200),
    releasedate DATE,
    genre       VARCHAR(100));

I then do this:

CREATE SYNONYM SongStaging FOR #songs

Now I drop the temp table.

DROP TABLE #songs

My ETL process will recreate the temp table using the code above when it runs. This stored procedure is then called:

CREATE PROCEDURE dbo.LoadSongs
AS
   INSERT Songs
   (
       songid,
       title,
       releasedate,
       genre
   )
   SELECT songid,
       title,
       releasedate,
       genre
	   FROM dbo.StagingSongs
   WHERE title NOT IN (SELECT title FROM Songs)
GO

I then run the ETL process and the stored proc, what happens during this flow?

Answer: New songs are loaded into the songs table.

Explanation: This works just fine. No errors. The synonym resoles to the name, so if you drop and rebuild the temp table with the same name, it works. Ref: Synonyms - https://docs.microsoft.com/en-us/sql/relational-databases/synonyms/synonyms-database-engine?view=sql-server-2017

Discuss this question and answer on the forums

 

 

 

Database Pros Who Need Your Help

Here's a few of the new posts today on the forums. To see more, visit the forums.


SQL Server 2017 - Administration
Azure Instance and Transactional Replication - I have an Azure Instance in and i want to make a subscription on this instance. The only problem I have come across is the transaction log continuing to become full even though I have it set to unlimited. One of the drawback with Azure is you cannot change the recovery model to Simple, so […]
business intelligence development studio(Bids) - Hello, I have a SQL 2017 and I want to use business intelligence development studio(Bids), I dont see the App on the start menu. Integration service are exist. Tanks!
SQL Server Upgrade - I have old physical SQL Server which is less than 2 yrs old, so we are planning to upgrade to SQL 2017/2019. So thinking of using the same old physical SQL Server for the upgrade. Would need to load the new windows version and SQL Server. Do you see any reason that it should not […]
SQL Server 2017 - Development
SSAS - Aggregation issue: “N/A” members be excluded from Excel “Grand Total " - We’ve build a Finance OLAP Cube in SSAS which is mostly used/browsed by Accountants. Due to a “self-service” policy that is followed, users are in the meantime required to make use of Excel (with a data connection to the cube) as their reporting and/or analysis tool. One of the dimensions, the Accounts dimension, contains the […]
SSIS - Adding Columns to existing table and load data - I am working in SSIS (Sql server version 2012) , I have a requirement to add two new columns to a destination table which already has five years of data. How can i add these two new columns and bring data to the destination table which already has 5 years of data. Thanks
Alert me Export table data to .MSG file using Stored Procedures or any plugin ? - Is it possible with a stored procedure to write/export/output the data as .msg outlook file using existing table data?
SQL SEREVR QUERY - I HAVE A TABLE called TBLGENDER LIKE BELOW IN THIS TABLE 2 people getting 8000,2 people getting 5500,2 people getting 3500 AND 4 people getting 5000 ID NAME GENDER SALARY CITY 1 SHIVA MALE 8000 LONDON 2 MAMATHA FEMALE 6000 SYDNEY 3 NIHAL MALE 5000 DELHI 4 ROHAN MALE 3500 DELHI 5 PANDA FEMALE 3500 […]
SQL Server 2016 - Development and T-SQL
Using log4net in a dll which is called via CLR function - Hello I am refactoring older code which is part of an SQL Server CLR Function call. The code has a horrible local logger implemented, I would like to use a standard logger such as log4net. What are the pros and cons of such an approach? Is it feasible? Has anyone done something similar before? Best […]
Issue with max() and aggregate, missunderstanding - Hi all, this is my first post here and I've been looking and trying to understand this for hours the various ways people have tried to get around this issue. My trouble is as follows. I have a database with three tables linked together by 'DocumentNo'. Two main linked tables which output all rows with […]
Help with Query Results and Collation - I should know the answer to these questions, but I'm having trouble with a couple of my queries. First, my goal here is to compare data from a table in a source database with data from a table in another database to see whether there are any differences. Here's a simple example: DECLARE @TableA TABLE […]
Administration - SQL Server 2014
SQL 2014 (or any other versions) Standard - ~Embedded Licensing~ - Anyone have any insights on how this licensing works and what the "Use Rights" are? I can't seem to find much about it. Most I gathered (and I could be wrong) is it is dedicated solely to the application that uses it. No other databases can be on it for any other reasons and no […]
SQL 2012 - General
MEMORY CONSUMPTION SQL 2012 - hi all, as you can see in attachment sql take 122  giga memory. and in report consmtion i see only  94 taken  by sql . ( this report generated from sql) and idea where 30 giga  is hidden :)? and how  i can seem them ? thanks  alot sharon  
SSIS adding new columns to existing table - Hello All, I am working in SSIS (Sql server version 2012) , I have a requirement to add two new columns to a destination table which already has five years of data. How can i add these two new columns and bring data to the destination table which already has 5 years of data. Thanks
How can I convert %5B and %5D to [ and ] in a string - Hi, I have a string nvarchar(255) and I am try to convert the ASCII codes in it and also extract part of the string. Here is an example of a value: urn:ADL:CC_LB_SF_EN_D_C_%5BLB_LS_TCD%5D_%5BLB_LS_CotCF%5D_28_1 This is what I would like to convert it to: urn:ADL:CC_LB_SF_EN_D_C_[LB_LS_TCD]_[LB_LS_CotCF]_28_1 Then I would like to select what is between urn:ADL: and the […]
SQLServerCentral.com Website Issues
A while ago I stopped receiving my daily and weekly newsletters. - A while ago I stopped receiving my daily and weekly newsletters.  I am not sure why.  Today as a test I unchecked the selections, saved the unchecked values and rechecked the selections.  I've done that before and it didn't help.  Can you please let me know why I am no longer receiving these newsletters. It's […]
 

 

RSS FeedTwitter

This email has been sent to {email}. To be removed from this list, please click here. If you have any problems leaving the list, please contact the webmaster@sqlservercentral.com. This newsletter was sent to you because you signed up at SQLServerCentral.com.
©2019 Redgate Software Ltd, Newnham House, Cambridge Business Park, Cambridge, CB4 0WZ, United Kingdom. All rights reserved.
webmaster@sqlservercentral.com

 

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -