 # A Stock Price Correlation Matrix

,

Originally developed by the statistician Karl Pearson, the correlation coefficient, also known as Pearson’s r, is meant to identify the degree of linear dependence between two variables.  In securities analysis, it can be utilized to measure the similarity between two stocks’ price movements over time. The correlation coefficient has a range between -1.0 and 1.0, where a result of 1 exhibit perfect correlation between two variables, a result of -1 exhibits perfect negative correlation, and 0 reveals no correlation. Two similar stocks in the same industry experiencing similar growth will have a higher correlation than stocks in variant industries with different levels of growth.  For more on the correlation coefficient, check out this link.

By building a matrix, we can measure the correlation of a set of securities against all members of that set.

We’ll start by creating and populating a price table.  Please run the attached script (prices.sql). The table stores the 8 fictitious stocks shown below:

 Ticker Description CAR1 Automotive Company DRG1 Pharmaceutical Company GLD1 Gold Mining Company OIL1 Energy Company RTL1 Retail Consumer Goods Company TCH1 Technology Company 1 TCH2 Technology Company 2 TCH3 Technology Company 3

Once the price table is built and populated, we can build a scalar function to calculate the correlation coefficient between two stocks(see below). This function accepts 4 variables: @TickerA and  @TickerB represent the two stocks for which we’re measuring correlation, while @DateStart and @DateEnd reflect the date range over which we’re making the measurement. Two assumptions made when using this function is that there are no null prices for either security throughout the specified range, and the specified range is sufficiently large(30 days or greater). A smaller date range would require a slightly different calculation.

```IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[fnGetCorrelation]') AND type in (N'FN', N'IF', N'TF', N'FS', N'FT'))
DROP FUNCTION [dbo].[fnGetCorrelation]
GO
CREATE FUNCTION [dbo].[fnGetCorrelation]
(
@TickerA NVARCHAR(10),
@TickerB NVARCHAR(10),
@DateStart DATETIME,
@DateEnd DATETIME
)
RETURNS NUMERIC(22,6)
AS
BEGIN
DECLARE     @MeanA NUMERIC(22,6),
@MeanB NUMERIC(22,6),
@Corr NUMERIC(22,6)
--Get Mean Value for Ticker A
SELECT @MeanA = AVG(Price) FROM Prices
WHERE ticker = @TickerA
AND Date >= @DateStart
AND Date <= ISNULL(@DateEnd, GETDATE())
GROUP BY ticker
--Get Mean Value for Ticker B
SELECT @MeanB = AVG(Price) FROM Prices
WHERE ticker = @TickerB
AND Date >= @DateStart
AND Date <= ISNULL(@DateEnd, GETDATE())
GROUP BY ticker
--Calculate Correlation Coefficient
SELECT
@Corr =(AVG((prcA.Price-@MeanA)*(prcB.Price-@MeanB)))/(STDEVP(prcA.Price)*STDEVP(prcB.Price))--,
FROM Prices prcA JOIN Prices prcB
ON prcA.Date = prcB.Date
WHERE prcA.ticker = @TickerA
AND prcB.TICKER = @TickerB
AND prcA.Date >= @dateStart
AND prcA.Date <= @dateend
RETURN(@Corr)
END
GO```

Let’s test out the function as follows:

`select dbo.fnGetCorrelation('CAR1', 'TCH1','1/1/2011','3/31/2011') AS Corr`

Results:

 Corr -0.338718

The above reveals that the price movements of stocks CAR1 and TCH1, with a correlation of roughly -.34, are very much dissimilar.

Now that we’ve got a function capable of calculating correlation, let’s build the matrix. We’ll need to create a stored procedure that accepts a table-valued parameter(TVP) to load a set of tickers(check out this link for more info on TVPs) to build the matrix.  But before building the procedure, we’ll first need to create a table type and subseqent TVP:

```
IF EXISTS (SELECT * FROM sys.types st JOIN sys.schemas ss ON st.schema_id = ss.schema_id WHERE st.name = N'TickerListType' AND ss.name = N'dbo')
DROP TYPE [dbo].[TickerListType]
GO
CREATE TYPE [dbo].[TickerListType] AS TABLE
( [ticker] [nvarchar](30) NOT NULL,
PRIMARY KEY CLUSTERED ( [ticker] ASC )WITH (IGNORE_DUP_KEY = OFF)
)
GO
DECLARE @Tickers AS TickerListType
/* Add data to the table variable. */ INSERT INTO @Tickers (ticker) VALUES ('CAR1'), ('DRG1'), ('GLD1')
SELECT * FROM @Tickers```

We’ll now build the stored procedure which will utilize a CROSS JOIN to fire off the fn_GetCorrelation function against all possible two-security combinations very easily.

The procedure will then perform a dynamic pivot and a self CROSS JOIN to calculate the correlation for all possible security combinations and lay them out in a matrix(please note that I used Andras Belokosztolszki’s fantastic dynamic pivot method, which is explained here). The dynamic pivot is required in order to accommodate a variable number of input securities as columns in the output:

```IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[uspGetCorrMatrix]') AND type in (N'P', N'PC'))
DROP PROCEDURE [dbo].[uspGetCorrMatrix]
GO
CREATE PROC [dbo].[uspGetCorrMatrix]
@Tickers TickerListType READONLY,
@DateStart NVARCHAR(10),
@DateEnd NVARCHAR(10)
AS
DECLARE @Sql NVARCHAR(MAX),
@Columns NVARCHAR(MAX)
SELECT @Columns = COALESCE(@Columns + ',','') + QUOTENAME(TICKER)
FROM
(SELECT DISTINCT ticker
FROM @tickers) AS Columns
ORDER BY Columns.ticker
--select @Columns
--select * FROM @Tickers
set @SQL = '
SELECT CorrMatrix,
' + @Columns + '
FROM
(
select CorrMatrix = a.TICKER, Ticker2 = b.TICKER,
Corr = dbo.fnGetCorrelation(a.TICKER, b.TICKER, ''' + @DateStart + ''', ''' + @DateEnd + ''')
from @Tick a CROSS JOIN @Tick b
)
AS SourceTable
PIVOT
(
SUM(Corr)
FOR Ticker2 IN (' + @Columns +')
) AS PivotTable '
EXEC sp_executesql @SQL, N'@tick TickerListType readonly', @Tickers
GO
```

Let’s now fire off the procedure by first populating a TVP with a set of tickers:

```DECLARE @tick AS TickerListType
INSERT INTO @tick (TICKER)
VALUES
('CAR1'),
('DRG1'),
('GLD1'),
('OIL1'),
('RTL1')
EXEC uspGetCorrMatrix @Tick, @DateStart = '01/01/2011', @DateEnd = '03/31/2011';```

The output is as follows:

 CorrMatrix CAR1 DRG1 GLD1 OIL1 RTL1 CAR1 1 -0.83086 0.491672 -0.77953 0.684447 DRG1 -0.83086 1 -0.52497 0.721084 -0.61818 GLD1 0.491672 -0.52497 1 -0.1153 0.335694 OIL1 -0.77953 0.721084 -0.1153 1 -0.37631 RTL1 0.684447 -0.61818 0.335694 -0.37631 1

Please use caution in utilizing this code against a large number of tickers over a long period as the query cost increases exponentially as we add additional tickers.

4.6 (15)

4.6 (15)