A Walk Through the Forest with Joe Simonian | Portfolio for the Future

What Does Machine Learning Have to do with Fama and French?

By Keith Black, PhD, CFA, CAIA, FDP, Managing Director, CAIA Association

“I went to the woods because I wished to live deliberately, to front only the essential facts of life, and see if I could not learn what it had to teach, and not, when I came to die, discover that I had not lived. I did not wish to live what was not life, living is so dear; nor did I wish to practice resignation, unless it was quite necessary. I wanted to live deep and suck out all the marrow of life, to live so sturdily and Spartan-like as to put to rout all that was not life, to cut a broad swath and shave close, to drive life into a corner, and reduce it to its lowest terms...”—Henry David Thoreau

CAIA Association and FDP Institute sat down with Dr. Joe Simonian to discuss the paper that he wrote with co-authors Wu, Itano, and Narayanan in the first issue of the new Journal of Financial Data Science. This discussion of “A Machine Learning Approach to Risk Factors: A Case Study Using the Fama-French-Carhart Model” is the fourth in a series of discussions with the authors of papers that are required readings for the FDP exam.

Many quants have traditionally focused on linear regression to build factor models. Standard econometric models may be very good at explaining the past but are not always the best forecasters. Most in finance are familiar with the Fama-French-Carhart model, which shows that stocks have historically had risks and returns related not only to the stock market, but also to the size, value, and momentum characteristics of each stock. Such models assume that the world has returns that are normally distributed and linear. Of course, financial markets don’t work that way, as there can be non-linear relationships and complicated interactions between variables.

Machine learning allows you to analyze problems in nonparametric and less formal ways. It often takes more than one analytical approach when working with alternative investments and multi asset class frameworks, such as when we want to look at the interaction between different investments and asset classes during times of extreme tail risk. Another advantage is the ability to simultaneously work with quantitative and qualitative data.

Rather than building a machine learning model from scratch and mining all available data to discover hundreds of potentially spurious factors, why not take the well-tested Fama-French-Carhart model as a starting point for our explorations into machine-learning technology? The outputs of market, value, growth, and momentum exposures for each stock can be used as inputs to a random forest model which tests specifications across a number of decision trees. Rather than looking at the average beta for stocks across the four factors, random forest and quantile regression models can analyze the behavior of factors at various percentiles in the distribution, thus turning a linear model into a nonlinear model that can more accurately analyze tail behavior.

After attributing the risk of each stock across the four factors, the next step is to build an understandable trading strategy that works out-of-sample. An Association Rule Learning (ARL) machine-learning system takes the distribution of the four factors for each stock and combines them with other market data such as volatility. The ARL system establishes deductive rules, such as if-then statements, based on market data. While it was difficult to find a profitable system using recent data from traditional linear regressions, the combination of two machine learning methods significantly improved trading results out-of-sample. Recent fund performance shows that quant managers who have made the investment to use data science and machine learning in significant ways have been outperforming quants who are still using the standard linear methods. Due to the increasing complexity of markets and the growth in information available, the managers who succeed in the future are the ones best equipped to use data science and machine learning tools to their advantage. We should question why traditional quants want to stick exclusively to their linear methods in an increasingly non-linear world.

Thanks Joe! Thanks Henry David Thoreau! There are things that the forest can teach us.

Watch this webinar and get more information on the FDP exams and a calendar of upcoming FDP webinars.

The paper “A Machine Learning Approach to Risk Factors” can be accessed through the readings packet available to FDP candidates or from the Journal of Financial Data Science.