Technical Article

The Ultimate Dupe Finder & Performance Test Parameter Set Researcher

,

*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
Util_FindHighest
By Jesse Roberge - YeshuaAgapao@Yahoo.com

Finds the largest dupe-sets of one or more columns of a single table or derived table.
Much simpler and much much lower overhead (1 scan total vs 3 scans per column) than Util_GetSelectivity
if you don't need the detailed selectivity analysis for indexes, and
without the dependencies on delimiter functions and a table of numbers.
Great for finding hoggy parameter combinations for performance testing queries (or parts of queries such as corellated subueries or derived tables) within stored procedures!

Required Input Parameters:
@TableName nVarChar(max), Name of the table to analyze. If you want to schema qualify, use 'schema.table'.
Can accept derived table subqueries. feed something like '(SELECT ... FROM ... JOIN ... WHERE ...) AS Table)
@ColumnName nVarChar(4000) Comma separated list of columns to analyze. 'column1' or 'column1,column2'

Optional Input Parameters:
@WhereClause nVarChar(4000)='' Analyze only a subset of a the data in a table. Omit the 'WHERE'
@Top int=25, Select @Top largest dupe sets

Usage:
EXECUTE Util.Util_FindHighest 'dbo.Products', 'FK_IndustryID, FK_CategoryID'

EXECUTE Util.Util_FindHighest
@TableName='
(SELECT Industry.IndustryName, Category.CategoryName, Product.AddDate
FROM dbo.Product JOIN dbo.Industry ON Product.FK_IndustryID=Industry.PK_IndustryID JOIN Category ON Product.FK_CategoryID=Category.PK_CategoryID
) AS ConfigBatches
',
@ColumnName='IndustryName, CategoryName',
@WhereClause='Product.AddDate>GetDate()-30'

Copyright:
Licensed under the L-GPL - a weak copyleft license - you are permitted to use this as a component of a proprietary database and call this from proprietary software.
Copyleft lets you do anything you want except plagiarize, conceal the source, or prohibit copying & re-distribution of this script/proc.

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as
published by the Free Software Foundation, either version 3 of the
License, or (at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Lesser General Public License for more details.

see <http://www.fsf.org/licensing/licenses/lgpl.html> for the license text.

*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

SET ANSI_NULLS ON
SET QUOTED_IDENTIFIER ON
SET ANSI_PADDING ON
GO

IF NOT EXISTS (SELECT * FROM sys.schemas WHERE name='Util') EXECUTE ('CREATE SCHEMA Util')
GO

IF OBJECT_ID('Util.Util_FindHighest', 'P') IS NOT NULL DROP PROCEDURE Util.Util_FindHighest
GO

/*
*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
Util_FindHighest
By Jesse Roberge - YeshuaAgapao@Yahoo.com

Finds the largest dupe-sets of one or more columns of a single table or derived table.
Much simpler and much much lower overhead (1 scan total vs 3 scans per column) than Util_GetSelectivity
if you don't need the detailed selectivity analysis for indexes, and
without the dependencies on delimiter functions and a table of numbers.
Great for finding hoggy parameter combinations for performance testing queries (or parts of queries such as corellated subueries or derived tables) within stored procedures!

Required Input Parameters:
@TableName nVarChar(max),Name of the table to analyze.  If you want to schema qualify, use 'schema.table'.
Can accept derived table subqueries.   feed something like '(SELECT ... FROM ... JOIN ... WHERE ...) AS Table)
@ColumnName nVarChar(4000)Comma separated list of columns to analyze.  'column1' or 'column1,column2'

Optional Input Parameters:
@WhereClause nVarChar(4000)=''Analyze only a subset of a the data in a table.  Omit the 'WHERE'
@Top int=25,Select @Top largest dupe sets

Usage:
EXECUTE Util.Util_FindHighest 'dbo.Products', 'FK_IndustryID, FK_CategoryID'

EXECUTE Util.Util_FindHighest
@TableName='
(SELECT Industry.IndustryName, Category.CategoryName, Product.AddDate
FROM dbo.Product JOIN dbo.Industry ON Product.FK_IndustryID=Industry.PK_IndustryID JOIN Category ON Product.FK_CategoryID=Category.PK_CategoryID
) AS ConfigBatches
',
@ColumnName='IndustryName, CategoryName',
@WhereClause='Product.AddDate>GetDate()-30'

Copyright:
Licensed under the L-GPL - a weak copyleft license - you are permitted to use this as a component of a proprietary database and call this from proprietary software.
Copyleft lets you do anything you want except plagiarize, conceal the source, or prohibit copying & re-distribution of this script/proc.

This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU Lesser General Public License as
    published by the Free Software Foundation, either version 3 of the
    License, or (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU Lesser General Public License for more details.

    see <http://www.fsf.org/licensing/licenses/lgpl.html> for the license text.

*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
*/
CREATE PROCEDURE Util.Util_FindHighest
@TableName nVarChar(max),
@ColumnName nVarChar(4000),
@WhereClause nVarChar(4000)='',
@Top int=25
AS

DECLARE @Table nVarChar(max)
DECLARE @SQL nVarChar(max)

SET @SQL='SELECT TOP ' + CONVERT(VarChar(10), @Top) + ' ' + @ColumnName + ', COUNT(*) AS DupeCount
FROM ' + @TableName +
CASE WHEN @WhereClause='' THEN '' ELSE ' WHERE ' + @WhereClause END + '
GROUP BY ' + @ColumnName + '
ORDER BY DupeCount DESC, ' + @ColumnName

--PRINT @SQL
EXECUTE(@SQL)
GO

--*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

Rate

2 (3)

You rated this post out of 5. Change rating

Share

Share

Rate

2 (3)

You rated this post out of 5. Change rating