SQLServerCentral Article

A Stock Price Correlation Matrix


Originally developed by the statistician Karl Pearson, the correlation coefficient, also known as Pearson’s r, is meant to identify the degree of linear dependence between two variables.  In securities analysis, it can be utilized to measure the similarity between two stocks’ price movements over time. The correlation coefficient has a range between -1.0 and 1.0, where a result of 1 exhibit perfect correlation between two variables, a result of -1 exhibits perfect negative correlation, and 0 reveals no correlation. Two similar stocks in the same industry experiencing similar growth will have a higher correlation than stocks in variant industries with different levels of growth.  For more on the correlation coefficient, check out this link.

By building a matrix, we can measure the correlation of a set of securities against all members of that set.

We’ll start by creating and populating a price table.  Please run the attached script (prices.sql). The table stores the 8 fictitious stocks shown below:

Ticker Description
CAR1 Automotive Company
DRG1 Pharmaceutical Company
GLD1 Gold Mining Company
OIL1 Energy Company
RTL1 Retail Consumer Goods Company
TCH1 Technology Company 1
TCH2 Technology Company 2
TCH3 Technology Company 3

Once the price table is built and populated, we can build a scalar function to calculate the correlation coefficient between two stocks(see below). This function accepts 4 variables: @TickerA and  @TickerB represent the two stocks for which we’re measuring correlation, while @DateStart and @DateEnd reflect the date range over which we’re making the measurement. Two assumptions made when using this function is that there are no null prices for either security throughout the specified range, and the specified range is sufficiently large(30 days or greater). A smaller date range would require a slightly different calculation.

IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[fnGetCorrelation]') AND type in (N'FN', N'IF', N'TF', N'FS', N'FT'))
  DROP FUNCTION [dbo].[fnGetCorrelation]
CREATE FUNCTION [dbo].[fnGetCorrelation]
                        @TickerA NVARCHAR(10),
                        @TickerB NVARCHAR(10),
                        @DateStart DATETIME,
                        @DateEnd DATETIME
                        RETURNS NUMERIC(22,6)
DECLARE     @MeanA NUMERIC(22,6),
            @MeanB NUMERIC(22,6),
            @Corr NUMERIC(22,6)
--Get Mean Value for Ticker A
SELECT @MeanA = AVG(Price) FROM Prices
      WHERE ticker = @TickerA
      AND Date >= @DateStart
      AND Date <= ISNULL(@DateEnd, GETDATE())
      GROUP BY ticker
--Get Mean Value for Ticker B
SELECT @MeanB = AVG(Price) FROM Prices
      WHERE ticker = @TickerB
      AND Date >= @DateStart
      AND Date <= ISNULL(@DateEnd, GETDATE())
      GROUP BY ticker
--Calculate Correlation Coefficient
  @Corr =(AVG((prcA.Price-@MeanA)*(prcB.Price-@MeanB)))/(STDEVP(prcA.Price)*STDEVP(prcB.Price))--,
      FROM Prices prcA JOIN Prices prcB
      ON prcA.Date = prcB.Date
      WHERE prcA.ticker = @TickerA
      AND prcB.TICKER = @TickerB
      AND prcA.Date >= @dateStart
      AND prcA.Date <= @dateend

Let’s test out the function as follows:

select dbo.fnGetCorrelation('CAR1', 'TCH1','1/1/2011','3/31/2011') AS Corr



The above reveals that the price movements of stocks CAR1 and TCH1, with a correlation of roughly -.34, are very much dissimilar. 

Now that we’ve got a function capable of calculating correlation, let’s build the matrix. We’ll need to create a stored procedure that accepts a table-valued parameter(TVP) to load a set of tickers(check out this link for more info on TVPs) to build the matrix.  But before building the procedure, we’ll first need to create a table type and subseqent TVP:

IF EXISTS (SELECT * FROM sys.types st JOIN sys.schemas ss ON st.schema_id = ss.schema_id WHERE st.name = N'TickerListType' AND ss.name = N'dbo') 
  DROP TYPE [dbo].[TickerListType]
CREATE TYPE [dbo].[TickerListType] AS TABLE
  ( [ticker] [nvarchar](30) NOT NULL,
DECLARE @Tickers AS TickerListType
/* Add data to the table variable. */ INSERT INTO @Tickers (ticker) VALUES ('CAR1'), ('DRG1'), ('GLD1')
SELECT * FROM @Tickers

We’ll now build the stored procedure which will utilize a CROSS JOIN to fire off the fn_GetCorrelation function against all possible two-security combinations very easily. 

The procedure will then perform a dynamic pivot and a self CROSS JOIN to calculate the correlation for all possible security combinations and lay them out in a matrix(please note that I used Andras Belokosztolszki’s fantastic dynamic pivot method, which is explained here). The dynamic pivot is required in order to accommodate a variable number of input securities as columns in the output:

IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[uspGetCorrMatrix]') AND type in (N'P', N'PC'))
  DROP PROCEDURE [dbo].[uspGetCorrMatrix]
CREATE PROC [dbo].[uspGetCorrMatrix] 
  @Tickers TickerListType READONLY, 
  @DateStart NVARCHAR(10), 
  @DateEnd NVARCHAR(10)
        @Columns NVARCHAR(MAX)
SELECT @Columns = COALESCE(@Columns + ',','') + QUOTENAME(TICKER)
    FROM @tickers) AS Columns
 ORDER BY Columns.ticker
--select @Columns
--select * FROM @Tickers
set @SQL = '
SELECT CorrMatrix,
' + @Columns + '
select CorrMatrix = a.TICKER, Ticker2 = b.TICKER,
Corr = dbo.fnGetCorrelation(a.TICKER, b.TICKER, ''' + @DateStart + ''', ''' + @DateEnd + ''')
from @Tick a CROSS JOIN @Tick b
AS SourceTable
FOR Ticker2 IN (' + @Columns +')
) AS PivotTable '
EXEC sp_executesql @SQL, N'@tick TickerListType readonly', @Tickers

Let’s now fire off the procedure by first populating a TVP with a set of tickers:

DECLARE @tick AS TickerListType
EXEC uspGetCorrMatrix @Tick, @DateStart = '01/01/2011', @DateEnd = '03/31/2011';

The output is as follows:

CorrMatrix CAR1 DRG1 GLD1 OIL1 RTL1
CAR1 1 -0.83086 0.491672 -0.77953 0.684447
DRG1 -0.83086 1 -0.52497 0.721084 -0.61818
GLD1 0.491672 -0.52497 1 -0.1153 0.335694
OIL1 -0.77953 0.721084 -0.1153 1 -0.37631
RTL1 0.684447 -0.61818 0.335694 -0.37631 1

Please use caution in utilizing this code against a large number of tickers over a long period as the query cost increases exponentially as we add additional tickers.



4.6 (15)

You rated this post out of 5. Change rating




4.6 (15)

You rated this post out of 5. Change rating