Back to Portfolio for the Future™

Machine Learning, Quant Models, and ESG factors: Who Uses Them and What Data Do They Mine?

April 9, 2020

Keith Black, PhD, CFA, CAIA, FDP, Managing Director of Content Strategy, CAIA Association

CAIA Association and FDP Institute recently had a conversation with Mike Chen and George Mussalli. Dr. Chen is the director of equity investments and Mr. Mussalli is the managing director and CIO of equity investments at PanAgora. They recently published “An Integrated Approach to Quantitative ESG Investing” in the Journal of Portfolio Management and stopped by to share some thoughts with us.

Investments in funds employing ESG considerations at some point in their investment process are growing worldwide, but at a slower pace in the US. US SIF reports nearly $12 trillion in sustainable and responsible investing, mostly from ESG incorporation. ESG incorporation or integration is the application of ESG factors as part of a holistic investment process. While the first iteration of ESG investors focused on excluding stocks that didn’t match investor values, the second iteration of ESG moved beyond this negative screening approach. Integration includes ESG factors as part of an investor’s quantitative modeling process. Along with traditional factors such as size, growth, quality, value, and momentum, ESG factors are deployed in an attempt to improve alpha. While governance issues have long been included in quant models, newly available data allows investors to integrate environmental and social factors into the mix.

One of the more important areas of research in using ESG factors in quant models is that the materiality of factors differs by industry and even company. It is important to understand that over 40 years, companies in the S&P 500 have made a dramatic change in their balance sheets from 17% intangible assets to 84% intangible assets. Environmental factors are most important for companies with large hard asset infrastructure, such as airlines, utilities, energy and mining companies, and manufacturers. Tech companies will have a greater focus on social factors such as data privacy, while financials may have greater materiality on governance factors. Even within an industry, materiality can differ. For example, Disney and Netflix may both be in the entertainment sector, but environmental factors are much more important for Disney given their cruise line and theme park operations. ESG modeling is in its infancy, only scratching the surface of ideas and questions to be researched.

Some managers can build custom portfolios for each client focused on impact, seeking to both earn alpha and tilt the portfolio toward factors aligned with the mission of the investor. This customizes separate account portfolios based on investor objectives and the ESG issues they desire to implement in their portfolio.

To avoid issues with overfitting and to improve out-of-sample statistical significance, it is important to pose an investment hypothesis before inspecting, or even acquiring, the data. Once you have an investment hypothesis, then go looking for the data. Ideas can come from anywhere, but often start through understanding and replicating academic papers. The key to finding unique ideas is to hire a highly diverse team with both quant skills and fundamental insights. Some have training in computer science and engineering, while others are trained in finance or medicine. The team may be from countries around the world with experience in a variety of industries.

Given this diverse team, PanAgora sees great potential for using data from other fields to apply in a financial markets context. For example, financial firms aren’t widely using NOAA data on daily ocean temperatures which may be used to predict air temperatures or the severity of storms.

Chen and Mussalli cite “Do Hedge Funds Profit from Public Information” by Crane, Crotty, and Umar. They are proud that this paper cites PanAgora as the second largest user of SEC data in the financial markets behind Renaissance Technologies and ahead of firms such as AQR and BlackRock as ranked by the top 30 users of EDGAR since 2003. The paper states that funds accessing SEC data have 1.5% higher annual abnormal returns. Renaissance and BlackRock prefer to access form 4 for insider transactions, while PanAgora uses 10-K, 10-Q, 8-K, 4, and 13-F filings. Form 13-F lists institutional owners of each stock, while forms 10-K and 10-Q are quarterly and annual financial statements disclosed by publicly traded companies, and the 8-K current report is an unscheduled release of material information. The adjusted r-squared of monthly returns and the download activity of a fund in a given month is 0.08.

PanAgora uses lots of publicly available data such as that disclosed by the SEC or the Carbon Disclosure Project. Of course, there is a heavy dose of natural language processing in PanAgora’s quant lab.

While PanAgora does interact with data vendors and can find some useful information in purchased data sets, many of the most valuable data sets are collected directly by PanAgora. It is important to not violate privacy laws or any regulations when collecting and using data. Once you know how to use NLP, it opens a vast world of unstructured data, such as mining chat rooms. Many factors are very specific to a single stock or a single industry, such as the fact that some video games have more than 30 times the download activity in 2020 than in 2019.

For short-term trading patterns, you can look at traffic from Internet search engines, which are now replete with terms like pandemic, unemployment, and bankruptcy. Most telling currently is that Internet search engine activity regarding late rent and mortgages is even higher in 2020 than in 2008. Are the machines telling us that global sentiment is even darker than predicted by financial markets? Maybe the data knows.

Get more information on FDP and register for the upcoming FDP exams beginning May 10.