SQLServerCentral Article

Low Profile Architecture

,

The goal of this article is to explain how to get unique transaction data from a data set containing multiple transactions per client per month. Taking into account the popularity of BI and analytical tools, I want to show that SQL and Excel can be considered a platform for complex reporting and data warehousing.

I plan to use Excel as a reporting tool (Excel 2010 is pretty good for this task) and SQL to get data from the SQL server database. Also, I’ll  use Excel’s drill down capability to see original records behind the counts on a pivot-table. So, I’m building a virtual data warehouse.

My first goal is to count unique transactions per client per month.

Let’s start from my records:

Client ID1, Transaction Code1, Transaction Date1, Comment

Client ID2, Transaction Code2, Transaction Date2, Comment

Client ID2, Transaction Code2, Transaction Date3, Comment

Here, the combination of Client ID + Transaction Code + Transaction Date comprises a primary key.

If Transaction Date2 and Transaction Date3 from my example always belongs to a different months Excel pivot table will do correct counts by months as shown in a pivot-table below:

Month 1

Month 2

Month 3

Transaction Code1

Counts of Client ID

Counts of Client ID

Counts of Client ID

Transaction Code2

Counts of Client ID

Counts of Client ID

Counts of Client ID

The SQL data source will be simple as well:

Select [Client ID], [Transaction Code], [Transaction Date] from DB
where [Transaction Code] in (…)

In the case of [Transaction Date2] and [Transaction Date3] belong to the same month, the pivot-table will calculate Client ID2 for Transaction Code2 in the same month. Because my level of granularity is a month I need to eliminate all records per month with the same Transaction Code and Client ID. To unduplicate my set I used transaction with maximum Transaction Date in a month per Client ID. In my initial solution, grouping by month was done in the pivot-table. I want to have same simple solution I had before – a pivot table with the same number of fields and names on a data source.

My plan was to use a combination of group by and having clauses. Unfortunately group by uses aggregation functions for fields not included in a grouping. In my example, the Comment field would lose its original value if it was placed in an aggregation function like max().

I started finding maximum Transaction Dates per client without using group by and having clauses.

The next SQL statement serves this purpose:

Select a.[Client ID], a.[Transaction Code], a.[Transaction Date] from DB a
where a.[Transaction Code] in (..)
and not exists
(
   Select * from DB b
   where b.[Client Id] = a.[Client ID] and b.[Transaction Code] in (…)
   -- This line filter a maximum transaction date per client
   and b.[Transaction Date] > a.[Transaction Date]
)

This solution looks up to the task and will eliminate records (say with Transaction Date2) from my result set. My further analysis showed that I missed a point on monthly granularity. If I have the same transactions (same Transaction Code) per client every month my script will return only the latest one. To preserve transactions on a monthly basis I added a comparison on month and (because reporting time frame can cross a year) a year comparison.

Select a.[Client ID], a.[Transaction Code], a.[Transaction Date] from DB a
where a.[Transaction Code] in (..)
and not exists
(
   Select * from DB b
   where b.[Client Id] = a.[Client ID] and b.[Transaction Code] in (…) 
  
   -- This line filter a maximum transaction date
   and b.[Transaction Date] > a.[Transaction Date]
   -- Next two lines group maximum values by year months
   and Year(b.[Transaction Date] = Year(a.[Transaction Date])
   and Month([b.[Transaction Date] = Month(a.[Transaction Date]) 
)

As a benefit of this method you may have any number of fields in the select list, (not just Comment) to present transactions details without restriction connected to using the aggregate functions. Those fields may appear on a drill-down list in Excel to explain details behind dry quantity numbers.

This method may be used as a base for building a simple data warehouse combining power of SQL and Excel which I call “low profile architecture”.

I appreciate any questions and comments to this article.

Rate

1.86 (14)

You rated this post out of 5. Change rating

Share

Share

Rate

1.86 (14)

You rated this post out of 5. Change rating