The controversy surrounding the political consulting firm Cambridge Analytica’s use of personal data harvested from social media accounts without the users’ permission is among the first of what likely will be a long series of public debates about how the use of “big data” can shape our lives. And one of the most obvious battlegrounds where we should expect such fights to play out soon is in the insurance industry.

It’s already long been the case that insurers’ business models call for parsing and segmenting massive datasets, from credit scores to accident trends to climate models. In recent years, advanced analytics have allowed insurers to make use of terabytes of new data derived from networked home systems, “telematics” recording devices and even social media to construct so-called “big data” models, with the promise of breakthroughs in everything from fraud detection to more effective customer service.

Unsurprisingly, insurers’ biggest use case for big data lies at the very heart of the business of insurance – the ability to assess risk. According to a 2015 survey conducted by Willis Towers Watson, 42 percent of executives from the property and casualty insurance industry said they were already using big data in pricing, underwriting and risk selection, and 77 percent said they expected to do so within two years.

But while such data hold great promise to better forecast claims and more effectively tailor insurance products to consumers’ needs, in an industry as politically sensitive and heavily regulated as insurance, there’s also significant peril. The more complex predictive modeling grows and the more attenuated from the sorts of relatively straightforward risk factors that both consumers and regulators can easily understand, the greater the odds of a backlash.

Insurance rates are regulated in nearly every state in accordance with standards that they not be “excessive, inadequate or unfairly discriminatory.” The first two standards stem from a time when most insurance rates were set collectively through industrywide cartels. Though the competitive landscape has changed a lot in the intervening decades, they remain relatively straightforward empirical tests. Rates need to be high enough to ensure long-term solvency, but not so high as to yield unreasonable profits.

The third standard is more recent and significantly fuzzier. What does it mean for insurance rates to be “unfairly discriminatory?” In the relatively recent past, race was a common underwriting variable, especially in life insurance, but to various degrees in health, auto and home insurance, as well. Public policy and popular sentiment both have turned decidedly against such practices. But the industry and policymakers alike have been debating subtler questions of fairness for decades, and those questions look likely to get harder to answer, not easier, in an era of big data.

For insurance companies, the harbinger for today’s big questions can be found in the decadeslong fight over the use of consumers’ credit scores in auto insurance. Over the past 40 years, credit scoring has utterly transformed the auto insurance industry by giving insurers a credible means to segment risks much more finely. It’s been a factor in everything from the massive shrinking of state residual market pools until they were just negligible, to the market’s broad shift away from agents and toward direct online underwriting.

It’s also been deeply controversial. On the one hand, it’s never been obvious to the typical consumer what one’s credit history has to do with one’s likelihood to get in an accident. On the other, the evidence shows pretty clearly that credit scoring is predictive of claims. On the one hand, some studies show that credit scores are strongly correlated with race and income. On the other, further studies have shown that credit scores are not actually proxies for either race or income.

The debate continues, sometimes spilling over into related controversies like whether insurers should be able to consider occupation or education level in setting auto insurance rates. While some states bar some of these factors completely, most have arrived at a middle ground that allows their consideration, but only in concert with other factors like driving record, territory and miles driven.

But as controversial as credit scoring has been, it’s important to note some of the key elements that advocates of its use have always had on their side, elements that might be lost in the big data debates to come. Consumers are, by now, accustomed to the concept of credit histories and broadly accept that good credit is a sign of personal responsibility. Regulators generally feel like they have a handle on how credit information is used by insurers and that it’s possible to draw lines between proper and improper use. And all parties generally agree that, even if credit scores do produce disparate impact, that isn’t the insurance companies’ intent. The goal, ultimately, is to assess risk.

None of that is nearly so clear when it comes to the “black box” predictive models that big data could permit. A credit score may not be a proxy for race—or for income, religion or national origin—but there’s plenty of personal information on a typical user’s Facebook page that very well can serve that purpose. Consumers aren’t likely to care whether an algorithm “intends” to discriminate when it finds a correlation between claims and, say, being a fan of Univision. And those are just the data that are easily available from a public profile. Expect a whole different level of outrage if the models pick up things from a user’s direct messages, browsing history or anything else that they would reasonably consider private.

For their part, regulators are completely unprepared to pick apart these predictive models. They lack either the staffing or the technical know-how to determine how the models work, which factors should be permitted and what sorts of weightings are reasonable. To be sure, they’ll try to catch up, but it doesn’t take a psychic to see how this is likely to play out. If regulators can’t figure out what’s going on inside the black box, they’ll scrutinize what comes out of it. If the rates and underwriting criteria that predictive models produce are shown to have a disparate impact on protected classes, it’s a safe bet such practices would be presumed “unfairly discriminatory,” and it would be on the industry to show why they aren’t. That’s more or less been the standard employed under federal lending regulations for a long time.

If credible predictive models can be crafted that don’t produce disparate impacts, so much the better for everyone. But where such impacts do arise, the industry needs to be able to explain why. The advantage of rating factors like driving record and even credit history is that, not only were they predictive of claims, but they fairly readily comport with common notions of “fairness.” Plucking a rate from a dense soup of data – which inevitably will include some things over which a consumer can exercise control, some things over which he or she cannot and many things consumers aren’t even aware are considered – isn’t likely to produce the same response.

Insurtech is the hottest topic in the industry today, and there is no question that, from AI to blockchain, there are a host of technologies that are set to transform the industry. The potential really is enormous and it’s easy to understand how seductive Silicon Valley’s rhetoric of transformation and innovation can be. But even as they explore these exciting new ventures, insurers would be well-served by remaining tethered to their history as a conservative industry focused, first and foremost, on the subject of risk. Both in the courts, and in the court of public opinion, when the bloodless efficiency of data tangle with deeply ingrained moral intuitions about what’s fair, the data often lose.

Image by Yuri Hoyda

Featured Publications