Back to Portfolio for the Future™

What Makes Big Data So … Big?

March 22, 2018

The term “big data” has become a cliché. One has to remind one's self that it is a somewhat ill-fitting label. What is new about the world of data isn’t that there is a lot of it; nor that on the software end the processing of data becomes easier over time, so that many data sets in common use today did not exist at all just a few years ago; nor that hardware magicians can squeeze so much of it into tiny objects.

What is new and important is something enabled by but distinct from either the hardware or the software magic. It’s that market needs (in a variety of markets, although of course we’re talking here about asset management and especially the alpha hunting portion thereof) have finally gotten into and seem likely to remain in the driver’s seat.

No longer does the IT desk tell the front office, “This is what we can do,” whereupon the strategists and traders try to figure out how they can make use of what IT can do. Rather, IT can now do anything. That is commonly understood. So the strategists and traders decide what data sets they want and want sort of analyses will be most helpful.

A Paper from Citi

A new paper by Citi Business Advisory Services throws a lot of light on where Big Data stands.

The paper argues that due to Big Data, “the innovation seen in systematic trading models over the past decade could accelerate” and (a closely related point) the “differences between what used to represent quantitative versus qualitative research” could disappear.

In a historical discussion, Citi reminds us that one of the landmarks in the development of Big Data was Google’s decision, in 2003, to publish the details of its Google File System (GFS) “describing the technology and architecture it developed to handle the massive amounts of data it had to process, store and analyze from retained searches and other applications across its distributed infrastructure.”  This led to the development of the Hadoop Distributed File System, an open source analog to the GFS. (it was named after designer Douglass Read Cutting’s son’s plush toy.)  From there, things got cracking in a Big Way.

Behaviors, opinions and sensory feedback – long treated as qualitative matters, if treated at all – are now subject to numbers crunching. This could create what Citi gently calls “information arbitrage” between firms that are making the best use of this numbers crunching and firms that aren’t. Less gently, those who aren’t could find themselves getting crunched.

The early adopters are going about adopting big data in their different modes. One group of IM funds is “investing heavily in developing a whole technology stack and hiring data scientists to support investment research.” But another  “segment of funds is experimenting with big data by either enlisting big data techniques that extend their existing research capabilities through proofs of concept or by piloting content from third-party providers, utilizing big data technology and new data sets.”

The market is full of data vendors and third-party service providers willing to sell what those early adopters want to buy.

Not all Roses and Plush Toys, Though

The process by which the new data capabilities and principles get internalized by the swifter funds, those that want to be on the winning side of the arb plays, isn’t a painless one. There are “integration and cultural challenges” that have to be overcome. After all, the experts that an aspiring arbitrageur would hire come from “internet firms, gaming companies, the military” and consumer research. The world of asset management will be new to them, so everyone on the developing teams can “work effectively together.”

And sometimes it fails. Sometimes a firm will invoke the new data capabilities in an innovative way and still … get an output without investable insights.  As the report says some funds “are spending long periods of time beta testing data as it often takes time for patterns to emerge and because some of the data being investigated is so new that that the nature of the data is changing over time.” Some failures will be part of the process moving ahead, as an infant learning to walk will fall down now and then.