Tyler Tackles a Challenge on Finding Correlated Securities
Today we feature one of our best challenge submissions by Tyler Beason. Tyler is a rock star in finance and he used his knowledge to break down his approach to finding correlated securities:
Recently, I completed a project that was a challenge posted on MindSumo. MindSumo is a site where companies can post challenges with rewards for students to compete to solve. The challenge was to find securities in the DJIA whose prices behaved similarly over a period of roughly 2 years (data set was provided). This problem of finding correlated securities was interesting, so I thought I’d share my solution here.
To be more specific about the challenge, we were asked to create 2 or more indices of securities from the Dow. The program had to output the lists, and there was sort of a subtle hint that there should be some visualization of data as a result as well. Top 5 solutions get $150 = I’m in.
Once I had the prices for each security, I calculated the daily return for each as well. We typically deal with returns in class, and many models use return rather than price, so this may have been strictly out of habit. I also calculated the trailing 30-day standard deviation of the prices. Each of these were calculated and placed on a separate worksheet. Ultimately, these things were a complete waste of time, and I wish I would have known in advance that they would be useless to me later on.
After creating all of those extra sheets, it was time to begin to tackle the real problem – finding correlated securities. Most people would have reached for the standard correlation matrix first, but not I. It was (almost) immediately apparent to me what a rabbit-hole the correlation matrix would be, and how painful it would be to work with it. If you insisted on a minimum correlation between every stock, no stocks would ever be selected. Thus, I thought of a new way to approach the problem. Perhaps rather than looking at the correlation of every individual stock pairing, what if I could look at only 30? But how would I even go about doing that? Well, this is where a bit of luck was involved.
Recall that the transitive property is “A = C and B = C implies A = B”. So, two different quantities being equal to a third different quantity means that everything is equal. In this case, I would just need some independent reference point to compare correlations with. I decided to use the average price of the stocks in the index as this reference point. For each day, I sum up the prices of every security in the index and divide by the total number of securities in the index. After that, I took the correlation of this series with each series of security prices. Unfortunately, correlation doesn’t have a transitive property in theory. However, as all of the correlations approach 1, the transitive property comes back into play because a correlation of 1 implies that they move together exactly. Luckily, stock prices move together quite frequently! There were quite a number of stocks that had correlations with the average security price that topped 0.9. In fact, over half of the stocks were over 0.9. There were ten stocks that had a correlation with the average security price of the index that was over 0.95, which I thought was truly astonishing. Since 0.95 is pretty close to one, I think it is safe to assume that these ten stocks will all be highly correlated with each other. With that discovery, the work for portfolio #1 was done! Regretfully, it took many hours to figure this out the first time.
My moment of joy was quickly overrun with new ideas for my next portfolio (I needed a minimum of two, remember?). At this point in the process, I was lost. Daily return and rolling 30-day variance proved to have very little correlation when compared to prices. Eventually, I had to face the facts that I could not escape the correlation matrix. I would have to tackle it head-first…or maybe I wouldn’t. A sort-of logical next step would be to look at the average correlations among the stocks. To do this, I created a full correlation matrix and each column. I subtracted 1 from each sum to exclude a stock’s correlation with itself. I then divided the sum by N-1 to find the average correlation that stock has with the other stocks in the index. This was not as foolproof as the first method, but it seemed to work well enough; over half of the securities had average correlations of 0.75 or higher. I took the ten stocks with the highest average correlations to be portfolio #2.
It was near this time that tragedy struck. Somehow, Microsoft screwed me over. I don’t know how, but they should know that I have a very particular set of skills, and when I find them…..Long story short, my file got corrupted and I lost hours of work. I have a decent memory, so I was able to recreate everything that had been lost, and I was just thankful that I didn’t lose everything. I will say that I had been saving frequently and that AutoRecover was absolutely zero help this time.
Once I had recovered from the setbacks, I set out to find a third approach to the problem. I stepped back and tried to simplify my thinking, and I’m glad I did. I began to ask myself, “If I was just handed a chart of security prices, how would I identify those that behaved similarly?” That question got my mind out of the world of data-crunching long enough to see the third part of the solution – draw a line! To be more specific, securities that had similar trendlines would be securities that in some way “moved together”. So, I use simple regression to find the line of best fit for each security, and I took the group of 10 that had the most similar slopes. Certainly not a slam dunk like the first method, but it does give some sort of answer to the question.
To polish it off, I created a summary worksheet and also made charts for each index of 10 securities. As a teaser, here is the chart for index 1. It is pretty easy to see that the stocks make significant movements together.
Chart of Index 1