Technical Article

The Ultimate Prospective-Index Column(s) Selectivity Analyzer

,

*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
Util_GetSelectivity
By Jesse Roberge - YeshuaAgapao@Yahoo.com
Update - Fixed existance check for drop

Gets single-column and cumulative-column selectivity stats and @Top largest dupe sets for each cumulative column stepping for a set of one or more columns for a table.
WARNING: this proc will completely scan the @SchemaName.@TableName table three times for each column name in @ColumnName.
This requires fn_DelimitToArray. You can search the scripts for 'Delimited String Parsing Functions - Basic Set' to get the function.
fn_DelimitToArray requires a table of numbers called 'Counter'. That can be found by searching the scripts for 'Counter table (table of numbers) setter-upper for SQL Server 2005'
Update 2009-05-01: The version submitted was broken for some reason. This version also references the delimiter functions and counter tables in a centralized 'global' database.

Required Input Parameters:
@ColumnName sysname Comma separated list of columns to analyze
@TableName sysname Table that these columns belong to

Optional Input Parameters:
@SchemaName sysname='dbo' Schema that @TableName belongs to
@Top int=25 Max number of rows to return for each column stepping for the 2nd result set.

Usage:
EXECUTE Util.Util_GetSelectivity @SchemaName='config', @TableName='states', @ColumnName='StateName', @Top=25

Copyright:
Licensed under the L-GPL - a weak copyleft license - you are permitted to use this as a component of a proprietary database and call this from proprietary software.
Copyleft lets you do anything you want except plagarize, conceal the source, proprietarize modifications, or prohibit copying & re-distribution of this script/proc.

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as
published by the Free Software Foundation, either version 3 of the
License, or (at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Lesser General Public License for more details.

see <http://www.fsf.org/licensing/licenses/lgpl.html> for the license text.

*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

SET NOCOUNT ON
SET QUOTED_IDENTIFIER ON
SET ANSI_NULLS ON
GO

IF OBJECT_ID('Util.Util_GetSelectivity', 'P') IS NOT NULL DROP PROCEDURE Util.Util_GetSelectivity
GO

/**
*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
Util_GetSelectivity
By Jesse Roberge - YeshuaAgapao@Yahoo.com
Update - Fixed existance check for drop

Gets single-column and cumulative-column selectivity stats and @Top largest dupe sets for each cumulative column stepping for a set of one or more columns for a table.
WARNING: this proc will completely scan the @SchemaName.@TableName table three times for each column name in @ColumnName.
This requires fn_DelimitToArray.  You can search the scripts for 'Delimited String Parsing Functions - Basic Set' to get the function.
fn_DelimitToArray requires a table of numbers called 'Counter'.  That can be found by searching the scripts for 'Counter table (table of numbers) setter-upper for SQL Server 2005'
Update 2009-05-01: The version submitted was broken for some reason.  This version also references the delimiter functions and counter tables in a centralized 'global' database.

Required Input Parameters:
@ColumnName sysnameComma separated list of columns to analyze
@TableName sysnameTable that these columns belong to

Optional Input Parameters:
@SchemaName sysname='dbo'Schema that @TableName belongs to
@Top int=25Max number of rows to return for each column stepping for the 2nd result set.

Usage:
EXECUTE Util.Util_GetSelectivity @SchemaName='config', @TableName='states', @ColumnName='StateName', @Top=25

Copyright:
Licensed under the L-GPL - a weak copyleft license - you are permitted to use this as a component of a proprietary database and call this from proprietary software.
Copyleft lets you do anything you want except plagarize, conceal the source, proprietarize modifications, or prohibit copying & re-distribution of this script/proc.

This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU Lesser General Public License as
    published by the Free Software Foundation, either version 3 of the
    License, or (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU Lesser General Public License for more details.

    see <http://www.fsf.org/licensing/licenses/lgpl.html> for the license text.

*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
**/
CREATE PROCEDURE Util.Util_GetSelectivity
@SchemaName sysname='dbo',
@TableName sysname,
@ColumnName sysname,
@Top int=25
AS

DECLARE @SQL VarChar(max), @SQL2 VarChar(max);
SELECT @SQL='', @SQL2='';

WITH
ItemCount AS
(
SELECT LEN(@ColumnName)-LEN(REPLACE(@ColumnName, ',', ''))+1 AS ItemCount
),
Delimit AS
(
SELECT Pos, LTRIM(Value) AS Value FROM Global.Global.fn_DelimitToArray(@ColumnName, ',') AS Delimit
),
ColumnNames AS
(
SELECT
(
SELECT
CASE WHEN Counter.PK_CountID>1 AND Counter2.PK_CountID=1 THEN '|' ELSE '' END +
Delimit.Value +
CASE WHEN Counter2.PK_CountID<Counter.PK_CountID THEN ',' ELSE '' END
FROM
ItemCount
JOIN Global.Global.Counter ON Counter.PK_CountID BETWEEN 1 AND ItemCount.ItemCount
JOIN Global.Global.Counter AS Counter2 ON Counter2.PK_CountID BETWEEN 1 AND Counter.PK_CountID
JOIN Delimit ON Counter2.PK_CountID=Delimit.Pos
ORDER BY Counter.PK_CountID, Counter2.PK_CountID
FOR XML PATH ('')
) AS String
),
GroupColumns AS
(
SELECT ItemCount.ItemCount, Delimit.Pos, Delimit.Value AS SingleColumn, ColumnNamesDelimit.Value AS AllColumns
FROM
ItemCount
CROSS JOIN ColumnNames
CROSS APPLY Global.Global.fn_DelimitToArray(Columnnames.String, '|') AS ColumnNamesDelimit
JOIN Delimit ON ColumnNamesDelimit.Pos=Delimit.Pos
)
SELECT
@SQL=@SQL + '
SELECT
' + CONVERT(VarChar(10), GroupColumns.Pos) + ' AS KeysIncluded, @Allcount AS AllRows,
''' + GroupColumns.SingleColumn + ''' AS SingleColumn,
' + CASE WHEN GroupColumns.Pos=1 THEN 'DistinctCountAll.DistinctCount AS DistinctCountOne, CONVERT(decimal(19,6), DistinctCountAll.DistinctCount)/CONVERT(decimal(19,6), @Allcount) AS SelectivityOne' ELSE 'DistinctCountOne.DistinctCount AS DistinctCountOne, CONVERT(decimal(19,6), DistinctCountOne.DistinctCount)/CONVERT(decimal(19,6), @Allcount) AS SelectivityOne' END + ',
''' + GroupColumns.AllColumns + ''' AS AllColumns,
DistinctCountAll.DistinctCount AS DistinctCountAll, CONVERT(decimal(19,6), DistinctCountAll.DistinctCount)/CONVERT(decimal(19,6), @Allcount) AS SelectivityAll
FROM
(SELECT COUNT(*) AS DistinctCount FROM (SELECT DISTINCT ' + GroupColumns.AllColumns + ' FROM ' + @SchemaName + '.' + @TableName + ') AS DistinctCount) AS DistinctCountAll
' + CASE WHEN GroupColumns.Pos=1 THEN '' ELSE 'FULL OUTER JOIN (SELECT COUNT(*) AS DistinctCount FROM (SELECT DISTINCT ' + GroupColumns.SingleColumn + ' FROM ' + @SchemaName + '.' + @TableName + ') AS DistinctCount) AS DistinctCountOne ON 1=1' END + '
UNION ALL',
@SQL2=@SQL2 + '
SELECT KeysIncluded, ' + GroupColumns.AllColumns + ', ' + REPLICATE('NULL AS AllAgg, ', GroupColumns.ItemCount-GroupColumns.Pos) + 'Dupe_Count
FROM
(
SELECT TOP ' + CONVERT(VarChar(10), @Top) + ' ' + CONVERT(VarChar(10), GroupColumns.Pos) + ' AS KeysIncluded, ' + GroupColumns.AllColumns + ', COUNT(*) AS Dupe_Count
FROM ' + @SchemaName + '.' + @TableName + '
GROUP BY ' + GroupColumns.AllColumns + '
HAVING COUNT(*)>1
ORDER BY Dupe_Count DESC
) AS UnionTable
UNION ALL'
FROM (SELECT * FROM GroupColumns ) AS GroupColumns
OPTION (MAXDOP 1, FAST 10, OPTIMIZE FOR (@ColumnName='whatever1, whatever2'))

SELECT @SQL='DECLARE @AllCount int
SELECT @Allcount=COUNT(*) FROM ' + @SchemaName + '.' + @TableName + '
' + LEFT(@SQL, LEN(@SQL)-9) + 'ORDER BY KeysIncluded
', @SQL2=LEFT(@SQL2, LEN(@SQL2)-9) + 'ORDER BY  KeysIncluded DESC, Dupe_Count DESC'

--PRINT @SQL;
--PRINT @SQL2;
EXECUTE (@SQL);
EXECUTE (@SQL2);
GO

--*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

Rate

2.33 (3)

You rated this post out of 5. Change rating

Share

Share

Rate

2.33 (3)

You rated this post out of 5. Change rating