SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 

Watch Your DataTypes in Aggregates–#SQLNewBlogger

Another post for me that is simple and hopefully serves as an example for people trying to get blogging as #SQLNewBloggers.

I’ve got a database of NBA statistics with data like this for players. I downloaded a CSV and loaded it into SQL Server.

2017-03-22 10_32_00-SQLQuery1.sql - (local)_SQL2016.NBA (PLATO_Steve (102))_ - Microsoft SQL Server

I decided to play with the data a bit and at one point wanted to see who scored the most points for a team and year. So I ran this query:

SELECT
    year,
    team,
    MAX(pts)
FROM dbo.player_regular_season
WHERE
    year = ‘1972’
    AND team = ‘LAL’
GROUP BY
    year,
    team;

The result was 705. That’s a decent number of points, and if I weren’t careful, this might seem fine. 1972 was a long time ago, and they didn’t score as many points as they do today in games.

In fact, if I were putting this in a summary report with lots of data, it might be the case that someone glancing at this would make a poor decision based on the data.

Why?

Let’s look at the data.

2017-03-22 10_46_16-SQLQuery1.sql - (local)_SQL2016.NBA (PLATO_Steve (102))_ - Microsoft SQL Server

Even a quick glance would let me know this seems funny. There are values of 1575 and 1084 in there, but the MAX() I returned was 705. If I look deeper at the import, I can see why.

2017-03-22 10_47_25-SQLQuery1.sql - (local)_SQL2016.NBA (PLATO_Steve (102))_ - Microsoft SQL Server

Anything stand out there? If you look, pts is a varchar, not a numerical value. In the character world, 705 beats 1575. I really need this query:

2017-03-22 10_48_30-SQLQuery1.sql - (local)_SQL2016.NBA (PLATO_Steve (102))_ - Microsoft SQL Server

Always be aware of the datatypes you work with and manipulate. Knowing a little bit about the meaning and use of the data can help you spot anomalies like this. As much as I like random test data, I’d also be sure you have some real data cases when you have users check your work. It’s easy for them to miss problems like this without good reference cases.

Or use good test data that you’ve setup and unit tests.


Filed under: Blog Tagged: SQLNewBlogger, syndicated, T-SQL

The Voice of the DBA

Steve Jones is the editor of SQLServerCentral.com and visits a wide variety of data related topics in his daily editorial. Steve has spent years working as a DBA and general purpose Windows administrator, primarily working with SQL Server since it was ported from Sybase in 1990. You can follow Steve on Twitter at twitter.com/way0utwest

Comments

Leave a comment on the original post [voiceofthedba.com, opens in a new window]

Loading comments...