Dmitry Borisenko, an independent scholar and a former quantitative analyst at Krauspartner Investment Solutions, posted a complicated article last year with the deceptively simple title, Dissecting Momentum: We Need to Go Deeper.
As the title suggests, Borisenko begins with the momentum strategy as a means of reaching alpha. Variables based on past prices do have predictive power, an anomaly never closed by arbitrage in the many years since it was documented in 1993 (by Jegadeesh and Titman). This predictive power has become the subject of a vast scholarly literature, which Borisenko reviews.
Standard momentum trading runs obvious risks, for the simple reason that market moves do turn, and often turn quite quickly and dramatically. Much of the recent literature on the subject discusses this point in connection with a form of machine learning, “neural networks.” Borisenko notes in his literature review that “returns on neural network portfolios reported in the literature exhibit pathological behavior eerily similar to that of the standard momentum, especially with value-weighting.” He cites for this point Marcial Messmer’s 2017 paper, “Deep learning and the cross-section of expected returns.”
Messmer, a quant at UBS Asset Management with a Phd. from St. Gallen, used 68 firm characteristics (FCs) to predict the cross-section of stock returns in the United States. He found that most of the FCs played only a minor role in the prediction: short-term reversal and 12-month momentum were the main drivers.
“Deep learning” is cited in both the Messmer and the Borisenko titles. How is deep learning different from garden variety machine learning? Or garden variety neural networks for that matter? The adjective “deep” comes from the multiple layering of the networks. Deep learning involves an unbounded number of layers, although each layer is of a bounded size, because the size bound allows for optimal implementation and applications.
Further, although the use of “neural networks” began as a way of allowing biology to inform AI research, the layers in contemporary deep learning work are allowed to deviate a good deal from the early “connectionist” models. Accordingly, machine learning, neural networks, and deep learning are a Venn diagram with three concentric circles, with machine learning the largest or outside circle/set, and deep learning the most central of the three.
A Deep Learning Model
Borisenko develops a deep learning model in some detail, and then contends that investment strategies built on its predictions can “actively exploit the non-linearities and interaction effects, generating high and statistically significant returns with a robust risk profile and their performance virtually uncorrelated with the established risk factors including momentum.”
Borisenko observes, too, that his deep learning model is at some risk of overfitting the training data, so he employs drop out and early stopping techniques. The “dropout” technique, pioneered by Srivastava et al. (2014) involves the random shutdown of a subset of units during gradient updates, leaving “an ensemble of nested models and averaging their predictions.”
Horizons and Parameters
Along the way, Borisenko narrows in on the time horizons that are important to the momentum strategy. “The most salient cross-sectional features predicting positive return,” he writes, “are the market model alpha over horizons from nine months to one year.” In this, his work corroborates that of Hannah Lea Huhn and Hendrik Scholz, in a 2018 paper, “Alpha Momentum and Price Momentum.” Also, the six months and one pear price momentum play is robust.
As a final stage of his multi-faceted discussion, Borisenko made a case for what he calls “automated hyperparameter optimization techniques” as a component in continued research of machine learnings in finance.
A “hyperparameter” is a parameter whose value is set independent of (and so chronologically prior to) the learning process. Hyperparameter optimization, then, is the problem of determining what the hyperparameters shall be. “Aha!,” an optimist on the subject of human indispensability might say, “the computers will always need us for that, anyway.” Well, maybe not. Algorithms can be devised that develop the hyperparameters for other algorithms, which in turn do this trading, so this appears to be the automated hyperparameter optimization for which Borisenko is calling.
The hyperparameters turn out to involve dropout probability, batch size, learning rate, the number of hidden payers (that is, the depth of the deep learning), the number of units per layer, and the patience of the early stopping.