A recent MIT Sloan research paper looks at the degree to which various environmental, social, and governance (ESG) ratings diverge and why.
The paper is the work of Florian Berg and Roberto Rigobon, both of MIT Sloan, and Julian F. Kolbel, who is affiliated with the University of Zurich’s department of banking and finance.
The authors consider the five most prominent rating agencies in the ESG space: KLD; Sustainalytics; Vigeo-Eiris; Asset4; and RobecoSAM. KLD is the heir of Kinder, Lydenberg, Domini & Co., a pioneer in the field, acquired by RiskMetrics in 2009. RobecoSAM is a Zurich-based concern under the ORIX umbrella. The authors acknowledge that there are other ratings systems and are monitoring those for future research.
A Lot of Noise with the Signals
The differences between the five aforementioned systems are considerable. The correlations between the ratings are on average 0.61 and may get as low as 0.42. This is remarkable. By contrast, the credit ratings of Moody’s and S&P correlate at 0.99.
Important consequences follow from the divergence, considered as a “noise” within the ESG signals. The authors note that ESG performance “is unlikely to be properly reflected in corporate stock and bond prices” so long as it is a challenge for an investor to figure out who is the laggard and who is the outperformer. Also, the noise frustrates those who do want to improve their ESG performance, as actions that will improve one rating may hurt another and have no impact at all on a third. Finally, the divergence creates challenges for empirical research.
Why do these ratings diverge as much as they do? The authors break down the overall divergence in three ways: scope divergence (related to the selection of different sets of categories); measurement divergence (related to different assessments); and weight divergence (related to the relative importance of categories in the computation of the aggregate ESG score).
Labor-Management Relations Measured
Measurement divergence is the most important. It explains more than half of the whole. To get a sense of what this means, consider a situation in which two rating agencies both consider an issuing company’s relationship with its workers as an important attribute. How does one measure this? One agency might measure it by the speed of worker turnover, regarding the firm with longer lasting employment relationships as scoring higher than another, all other things being equal. But another agency might use the number of labor cases that are filed against an issuer as the key metric, with fewer filings meaning better relationships and the higher score. These metrics are “likely to lead to different assessments,” the authors say.
The authors also discovered in the course of their research that there is a fourth cause of divergence, which they call the rater effect. The various rating agencies’ assessments of individual categories “seems to be influence by their view of the analyzed company as a whole.”
There also are large differences in the amount of correlation, or conversely in the amount of divergence, from one category to another. Environmental policy has an average correlation level of 0.57. But many of the categories within the social dimension of ESG are much lower. Some are even negative.
A Recommendation
The authors make several recommendations in light of their findings: for researchers in this field, for managers, for the raters, and for investors.
Investors should try imposing their own weighting on the indicators of different rating agencies and deriving their own scores. If they internalize some of the work currently left to the agencies, they can considerably reduce the discrepancy.
However, about half the difference will remain. If they five (or more) agencies give differing weights to, in our above example, labor management relations, that need not unduly bother an investing institution that can decide on its own weighting. But that still leaves the issue of the proper metric, which is precisely the sort of granular decision likely to stay in the agencies’ hands.