Use Excel to DIFF Your Query Outputs

  • Comments posted to this topic are about the item Use Excel to DIFF Your Query Outputs

  • Nice article...

    Happy New Year to you all...

  • Hi... interesting idea, until now I was using Excel mainly for cross-checking query outputs by pivot tables, while using KDiff[/url] for diffing as it is specialized for the job.

    Happy NewYear from me, too 😉

  • Great tool.

    I'm already started using it and it's a charm.

    I made some changes for mine so that it could work in any workbook regardless of the names of the sheets. It will do the campare on the first 2 sheets in the active workbook.

    I've pasted my code under my personal xls modules so that my macro becomes global.

    Here is the modified code if anyone is interested:

    ' Clear all the indicators that previous comparisons may have set.

    Private Sub ClearMarkers()

    ' Clear Sheet1

    'ThisWorkbook.Sheets.Select

    ActiveWorkbook.Sheets(1).Activate

    ActiveWorkbook.ActiveSheet.Cells.Select

    Selection.Interior.ColorIndex = xlNone

    Selection.Font.ColorIndex = 0

    ActiveWorkbook.ActiveSheet.Cells(1, 1).Select ' undo selection of entire sheet

    ' Clear Sheet2

    ActiveWorkbook.Sheets(2).Activate

    ActiveWorkbook.ActiveSheet.Cells.Select

    Selection.Interior.ColorIndex = xlNone

    Selection.Font.ColorIndex = 0

    ActiveWorkbook.ActiveSheet.Cells(1, 1).Select ' undo selection of entire sheet

    End Sub

    ' Walk through Sheet1 and Sheet2, setting markers wherever differences

    ' in cell contents are found.

    Public Sub Diff_Sheet_1_and_2()

    On Error GoTo ErrHandle

    Call ClearMarkers

    ' Determine range of used cells, using the highest row and column counts

    ' found in either of the two sheets.

    Dim HighRow As Long

    HighRow = ActiveWorkbook.Sheets(1).UsedRange.Rows.Count

    If ActiveWorkbook.Sheets(2).UsedRange.Rows.Count > HighRow Then

    HighRow = ActiveWorkbook.Sheets(2).UsedRange.Rows.Count

    End If

    Dim HighCol As Long

    HighCol = ActiveWorkbook.Sheets(1).UsedRange.Columns.Count

    If ActiveWorkbook.Sheets(2).UsedRange.Columns.Count > HighCol Then

    HighCol = ActiveWorkbook.Sheets(2).UsedRange.Columns.Count

    End If

    ' Walk through the cells of both sheets, comparing and changing colors

    ' if differences are found.

    Dim RowIndex As Long

    Dim ColIndex As Long

    Dim RowFirst As Long

    Dim ColFirst As Long

    For RowIndex = 1 To HighRow

    For ColIndex = 1 To HighCol

    ' Compare formulas, not "text" or other formatting-affected attributes.

    If ActiveWorkbook.Sheets(1).Cells(RowIndex, ColIndex).Formula <> ActiveWorkbook.Sheets(2).Cells(RowIndex, ColIndex).Formula Then

    ' Determine how to indicate the difference on Sheet1.

    If ActiveWorkbook.Sheets(1).Cells(RowIndex, ColIndex).Text = "" Then

    ' If the cell contains no text, highlight the empty cell.

    ActiveWorkbook.Sheets(1).Select

    ActiveWorkbook.Sheets(1).Cells(RowIndex, ColIndex).Select

    Selection.Interior.ColorIndex = 38

    Else

    ' If the cell contains text, color the text.

    ActiveWorkbook.Sheets(1).Cells(RowIndex, ColIndex).Font.Color = &HFF

    End If

    ' Determine how to indicate the difference on Sheet2.

    If ActiveWorkbook.Sheets(2).Cells(RowIndex, ColIndex).Text = "" Then

    ' If the cell contains no text, highlight the empty cell.

    ActiveWorkbook.Sheets(2).Select

    ActiveWorkbook.Sheets(2).Cells(RowIndex, ColIndex).Select

    Selection.Interior.ColorIndex = 38

    Else

    ' If the cell contains text, color the text.

    ActiveWorkbook.Sheets(2).Cells(RowIndex, ColIndex).Font.Color = &HFF

    End If

    ' If this is the first row containing a difference, remember the cell

    ' in which the change occurred so we can return to it later.

    If RowFirst = 0 Then

    RowFirst = RowIndex

    ColFirst = ColIndex

    End If

    End If

    Next

    Next

    ' Either report no differences or focus on the first difference found.

    If RowFirst = 0 Then

    MsgBox "No differences!"

    Else

    If ActiveWorkbook.ActiveSheet.Index = 1 Then

    ActiveWorkbook.Sheets(1).Cells(RowFirst, ColFirst).Activate

    End If

    If ActiveWorkbook.ActiveSheet.Index = 2 Then

    ActiveWorkbook.Sheets(2).Cells(RowFirst, ColFirst).Activate

    End If

    End If

    Exit Sub

    ErrHandle:

    MsgBox Err.Description

    End Sub

    ----------------------------------------------
    Try to learn something about everything and everything about something. - Thomas Henry Huxley

    :w00t:
    Posting Best Practices[/url]
    Numbers / Tally Tables[/url]

    SQL-4-Life
  • My usual method for comparing data sets from SQL databases is just simply do an outer join between the two. For procs, that means inserting them into temp tables and then joining the temp tables.

    For example:

    create table #T1 (

    CountryName varchar(100) primary key,

    ISO2 char(2),

    ISO3 char(3),

    ContinentName varchar(100));

    insert into #T1 (CountryName, ISO2, ISO3, ContinentName)

    exec dbo.MyProc1;

    create table #T2 (

    CountryName varchar(100) primary key,

    ISO2 char(2),

    ISO3 char(3),

    ContinentName varchar(100));

    insert into #T2 (CountryName, ISO2, ISO3, ContinentName)

    exec dbo.MyProc2;

    select *

    from #T1 t1

    full outer join #T2 t2

    on t1.CountryName = t2.CountryName

    and t1.ISO2 = t2.ISO2

    and t1.ISO3 = t2.ISO3

    and t1.ContinentName = t2.ContinentName

    where t2.CountryName is null

    or t1.CountryName is null;

    That will give you a full set of discrepancies between the two record sets.

    If you want to change the comparison, change the join. For example, you might want to just know which countries have exact matches, but where the continent is different. So change the join to:

    on t1.CountryName = t2.CountryName

    and t1.ContinentName != t2.ContinentName

    (And get rid of the where clause.)

    You'll immediately have those differences. And so on, whatever comparisons you want to do.

    Make a change to one of the procs, truncate the table its results are in, rerun the insert statement, and rerun your comparisons, till you get matching result sets.

    If the result sets are large enough and the comparisons are complex enough, you can even add indexes to the temp tables to make the queries faster.

    Also, if it matters, this doesn't run into the max rows issues that Excel has (though 2007 vastly increased the number of rows allowed).

    Not saying this method is better, just saying it's what I use.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Another tool to put in your belt is EXCEPT:SELECT * FROM table1

    EXCEPT

    SELECT * FROM table2

    will give you all the records in table1 that do not exactly match a record in table2. It doesn't display them side-by-side, but does tell you which record(s) are different. Reverse it to get the records in table2 that do not exactly match a record in table1.

    Another quick shortcut I have used is:SELECT COUNT(*) FROM table1

    SELECT COUNT(*) FROM table2

    SELECT COUNT(*)

    FROM

    (SELECT * FROM table1

    UNION

    SELECT * FROM table2) Combined

    If all three counts are the same, the tables are identical. Again, if there are differences it doesn't tell you where they are, but it is very easy to see if there are differences.

    Chad

  • Excel is a very versatile tool, as your article demonstrates.

    A couple of mods I'd like to see in your code are:

    1. Show the value from the other worksheet as a comment (saves skipping back and forth between worksheets to see what the difference that's been identified actually is). You can do this by adding these lines:

    Sheet1.Cells(RowIndex, ColIndex).AddComment "Value in worksheet " & Sheet2.Name & " is " & Sheet2.Cells(RowIndex, ColIndex).Formula

    Sheet2.Cells(RowIndex, ColIndex).AddComment "Value in worksheet " & Sheet1.Name & " is " & Sheet1.Cells(RowIndex, ColIndex).Formula

    insert the above code just after:

    ' Compare formulas, not "text" or other formatting-affected attributes.

    If Sheet1.Cells(RowIndex, ColIndex).Formula <> Sheet2.Cells(RowIndex, ColIndex).Formula Then

    NB: If you're running this multiple times on the same data you need to clear the existing comments first.

    2. If a row in one sheet doesn't exist in the other sheet, this has the effect of labelling every subsequent row as being different. It would be useful to have the option to look ahead at the next x number of rows in the dataset and use them to determine whether a row's missing, and if it is then to flag that, rather than considering every subsequent row as changed.

    Re the earlier post suggesting modifying the code to deal with different sheet names, the code in the article does already do this.

    Nick

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply