Bayesian Probability Theory and a Hierarchical Learning Portfolio | Portfolio for the Future

Two scholars working with Bayesian probability theory recently published a fascinating discussion of market timing and portfolio efficiency. They have proposed what they call a “hierarchical ensemble learning portfolio.” Yes, that sounds rather heavy on the jargon. We’ll break it down a bit in what follows.

The authors of the new study are: Guanhao Feng, of the College of Business, City University of Hong Kong, and Jingyu He, of the Booth School of Business at the University of Chicago.

Bayesian probability inference is a technique for dealing with incoming data. One begins with a hypothesis (admittedly subjective, perhaps even arbitrary), called the “prior,” then update the prior as a new datum is received. The update is naturally called the posterior, but then becomes the prior for the next round.

In portfolio analysis this amounts (as Feng and He put it) to “re-estimating and then rebalancing the portfolios on a rolling window basis.”

Ensemble Learning and Hierarchy

This is generally seen as distinct from an ensemble learning approach. In ensemble learning, each of several algorithms is treated as a distinct decision maker, and the ultimate decision (the portfolio) is the consequence of a vote of that virtual committee. Ensemble learning models aimed at prediction are generally said to outperform the models that serve as their separate components.

So, can (and should) the Bayesian and the ensemble approaches to portfolio construction be combined? These authors propose that they should be. The hierarchical nature of the modeling is the key to how and why they should. Predictive modeling becomes hierarchical when you use several different levels of observation.

By way of a simple example: if I wanted to characterize (or predict) the results/scores of students on a standardized test I could break the students in my population down into several layers of nesting: by state, by town, by school, by teacher.

A little bit closer to home: if I have a hypothesis about asset values in a portfolio, and that hypothesis is revised by subsequent data, that process of revision itself is subject to parameters. Those parameters can be subject to higher level hypotheses, and those hypotheses can be tested and revised by subsequent data, subject to higher level parameters, and so forth.

The point is this: if we think of ensemble learning as one approach to portfolio modeling, and Bayesian prediction as another, if may not be obvious that the two have much in common. But if we think of them both as hierarchical they’re ready to be merged. And by now we have worked our way through the jargon in the phrase “a hierarchical ensemble learning portfolio.”

A Rolling 120-Month Window

Without trying to summarize the Feng/He approach, I will say that it is a portfolio optimization approach with a rolling 120-month window. This approach operates on different levels of classification for industry returns, because both the finer classifications (fewer industries in the basket) and the grosser classifications (hierarchically lumping those classifications) have their value.

Here is a bit of these authors’ exposition: “[W]e focus on average signal for a predictor. We can simply perform a posterior inference ... for whether a macro predictor is useful in the conditional formulation. We can also infer which characteristics drive time-varying predictor coefficients. Finally, Bayesian shrinkage prior acts like a regularization penalty on the average signal and helps out-of-sample forecasting.”

All of this may pique a reader’s interest in the bottom-line question: does their system work? Will it help asset managers earn alpha?

Final Thoughts

Feng and He are encouraged by their results, which “provide promise of the continuing progress of Bayesian methods in empirical asset pricing.” Long-term yield, inflation, and stock market variance are all helpful macro predictors. Further, dividend yield, accrual, and gross profit turn out to be useful asset characteristics.

Their model developed a long-short portfolio that achieved a 46% Sharpe ratio. Over the period 1998 to 2017 they achieved cumulative returns of 500%. Their portfolio “outperforms most workhorse benchmarks as well as the passive investing index.”

These same two authors recently collaborated with Nicholas G. Polson, who like He is affiliated with the Booth School, for an article on “deep learning” for the prediction of asset returns. That involved the use of long short-term memory for time series effects.