Op-Eds

In defense of the mosaic theory

by Paul Rosenzweig

Nov 29, 2017

issues: Data Policy, Open Government, Technology and Innovation

originally published in Lawfare

As the Supreme Court begins its formal consideration of the Carpenter case, it seems useful to me to finally take up the challenge that my friend, Orin Kerr, has often laid down — he asks why nobody is defending the mosaic theory? So let me do it in this (rather lengthy) post. By way of short introduction the “mosaic theory” is the idea that large scale or long-term collections of data reveal details about individuals in ways that are qualitatively different from single instances of observation and the related idea that as a consequence Fourth Amendment law should take account of that fact through a warrant requirement for “big data” collection. Orin thinks theory is flawed. I agree — but it is better than the alternatives. In this post I hope to explain why.

By way of background: Carpenter, you will recall, involves warrantless government access to historical cell site location data. Cell site location data, as the name implies is a set of locational information collected by your cell service provider as part of the business of providing you with cell service. As you sit in your office today, with your phone on, the phone is checking in with nearby cell towers to announce its presence — that’s how the system knows where to route calls to you if you are roaming and it is an essential feature of providing phone service to you. Your service provider (say Verizon) can, and often does, retain those locational records for a period of time. And thus, it is feasible to track your movements, on a broad scale by collecting your history of cell site location information.

As a result, as in the case of Carpenter, the same data can be used in an investigative manner. If, as with Carpenter, we have a series of robberies, law enforcement can “dump” the historical cell site location data from the cell towers nearest the various robbery sites and then cross check them against each other. If, say, a single phone is geolocated near all of the robberies that, as we say in the business, is a clue 🙂 and it certainly gives the authorities good reason (and even probable cause) to go and investigate that individual further. S/he is either the victim of a very unfortunate coincidence or, quite possibly, a participant in the string of robberies.

That, of course, is exactly what happened to Carpenter. His phone was near a bunch of robberies and, as a result of learning that, the investigation focused on him, leading to his eventual conviction. And so the question now presented is what, if any, predication the government needed to get the “dump” of data. Current law, and the courts below, said that a simple subpoena was sufficient — and that is all the government got in this case. Carpenter, and his supporters say that the intrusion on the public is so severe that a warrant should be required. [Note, of course, that the dumps were a means of finding probable cause and antecedent to it. Requiring a probable cause showing as a prelude to cell tower dumps would greatly limit their use.]

Supporters of the government’s view have a simple analysis of the issue — under the long-standing doctrine known as the “third party” doctrine, consumers do not have any privacy interest in information they voluntarily disclose to third parties and those third parties are under no obligation (at least constitutionally) to maintain the privacy of information they receive from consumers. Because (as this argument goes) Carpenter voluntarily gave his geolocation information to his cell service provider, he has no right to say that a warrant is needed before that provider gives it to the government. [Let’s put aside for now one aspect of this case — whether it is really fair to say that Carpenter acted “voluntarily” in giving up his cell location data. For purposes of this case, everyone pretty much assumes that Carpenter can’t really argue to the contrary since, “of course,” he could have not used a phone at all.] Some of Carpenter’s supporters think about revisiting the third-party doctrine — they say it isn’t realistic in a world where it is impossible to function without sharing private information with service providers. Digital reality, they say, ends the validity of the third party doctrine.

Even assuming the applicability of the third party doctrine, some supporters of Carpenter have a different argument. They say that collecting information in small quantities is OK, but that large scale bulk data collections (like the “dumps” at issue in Carpenter) are different and should be subject to a warrant requirement. This is the so-called “mosaic theory” (see … we got here eventually!), named after the idea that a single piece of tile in a mosaic is just a single tile with a single color, that tells you nothing. But if you collect enough tiles, put them in a pattern, and step back, you can see a beautiful Roman mosaic. In much the same way a pointillist painting by Seurat is just a collection of individual dabs of paint — but taken collectively they become a beautiful landscape.

The fundamental idea here is that aggregations of data create information beyond their individual value. 1+1+1 equals 17, not just 3. In the context of geolocation tracking, if I collect information on only a single trip, I know you went from point A to point B — but nothing more, really (unless I have external information to cross reference). But if I collect a month’s worth of trips (or six months) then I can readily discern where your home is; where your office is and so on. I can identify your drug stash house — but also where your Alcoholics Anonymous meeting is held. The mosaic theory (that I am defending) is simply the idea that large scale and long-term aggregations of data are different in kind from single events (or small-scale ones) and that a warrant should be required for larger data collections.

To this, opponents have a number of responses. The most trenchant of them was offered by Judge David Sentelle in United States v. Jones. As he said there:

The reasonable expectation of privacy as to a person’s movements on the highway is * * * zero. The sum of an infinite number of zero-value parts is also zero.

In other words, since the third party doctrine says that there was no Fourth Amendment violation in a single observation (in that context on a highway, in Carpenter’s case in a cell tower), it follows, for Judge Sentelle, that there was none for multiple observations. He rejected, therefore, the view of some of his colleagues that a Fourth Amendment issue was presented “because that whole reveals more than does the sum of its parts.”

As I said, Orin Kerr, is not a fan of the mosaic theory. If I can summarize his argument (and I think I am doing so fairly, but it is pretty much impossible to summarize a 40+-page law review article in a few words), the objection to the mosaic theory is basically threefold: First, Orin says that the mosaic theory is doctrinally novel and a change from the “sequential” way in which Fourth Amendment violations are currently assessed. This is, in a more sophisticated way, Judge Sentelle’s 0+0+0 analysis. Second, Orin argues that the mosaic theory would be difficult to implement in practice. How much aggregation is too much? Does the mosaic theory apply outside the digital context? Would the exclusionary rule apply? Etc. Third, Orin deploys his overarching theory of the Fourth Amendment — what he calls “equilibrium adjustment” (broadly the idea that technological changes should be neutral to the Fourth Amendment and that the interpretation should adjust to maintain the State/citizen balance) — and argues that the mosaic theory is the wrong way to readjust the balance.

Anyone (like me) writing about digital surveillance law should be very reluctant to cross swords with Orin, since his depth of scholarship far exceeds theirs. Nonetheless, I think Orin is wrong — at least about the mosaic theory in general. Though, oddly enough, I think he is right about the result that should follow in the Carpenter case — for that very reason. So… here goes — why should courts think kindly of the mosaic theory?

Reason No. 1, and frankly, the singularly most persuasive reason to me is that the mosaic theory is scientifically accurate. Large data aggregations are how the government tracks potential terrorists who travel internationally and it is how Google knows what ads to serve you. It is how political campaigns target you for your vote and charities target sympathetic people for money. In short, the mosaic theory is real — with enough data 1+1+1 really does equal 17, or even 170. So the best reason to incorporate it into the law is that, like any good law and economics student from the U. of Chicago, I think that the law should reflect reality. Mosaic analysis is reality. I recognize that this is mostly a normative claim about what the Fourth Amendment should be, rather than a descriptive claim about what it actually is, but this idea of accommodating technological reality is at least a partial description of why the Court reach the result it did in Katz (involving the novel technology of wire taps) and Kyllo (heat detectors).

Second, big data analytics really does reset the balance between the State and its citizens. Back in the dawn of data basing the Supreme Court was so skeptical of big data that it developed a doctrine known as the doctrine of “practical obscurity” — that is the idea that collecting large amounts of data was so difficult and resource intensive that, in practice, it was not realistically possible. And hence, data maintained in distributed databases was practically obscure from discovery and that obscurity protected privacy. [The case is DOJ v. Reporters Committee for Freedom of the Press — a fascinating FOIA case. DOJ had painstakingly assembled a database of criminal histories from public records. The reporters FOIAed the list reasoning that public records don’t become private when collected in one place and that DOJ having done all the hard work, the reporters should reap the benefit. The reporters lost 9-0. As the court said: “the privacy interest in maintaining the rap-sheet’s “practical obscurity” is always at its apex.”] Today, the practical obscurity of 1989 is a thing of the past. In much the same way that GPS systems are much, much cheaper than having officers tail a suspect, big data collections are much, much easier than human investigation.

Big data also resets the balance in some ways because of its surreptitious nature. The collections of data are so large and so pervasive that no citizen can realistically be aware of what data about him is being collected. This insight is part of what drove Justice Alito in the Jones case (involving a GPS tracker on a car and tracking for 28 days). As he put it: “Is it possible to imagine a case in which a constable secreted himself somewhere in a coach and remained there for a period of time in order to monitor the movements of the coach’s owner?” Given the obvious answer (“no” :-)) Justice Alito realized that there was a qualitative difference in long-term surreptitious monitoring that could not be replicated in the nondigital world.

So both in its transformative disruption of obscurity and in its potential for surreptitiousness, data collection for a mosaic seems to me to clearly disturb the current equilibrium between State and citizen.

I think that opponents of the mosaic theory, like Orin, acknowledge this. In his law review article, Orin agrees (or at least seems so to me) that big data analytics does reset the balance. But he suggests that the right way to reset it is to pick some step along the way, and make THAT the point at which the Fourth Amendment applies. Hewing to what he calls a “sequential” approach, which is the traditional method of analysis, he suggests an “all or nothing” rule — either no cell site data are protected or, perhaps, our expectations of privacy have changed so much that all of it should be protected.

While that approach has the virtue of doctrinal clarity, the problem with it is that its bad science. I can (and do) readily admit that finding that scientific limit is challenging but it just seems to me that requiring fidelity to the sequential approach does not, in the end, do justice to the text of the Amendment (and its history) in this new context. That text focuses on the reasonableness of police practices — and that language does not, it seems to me, demand that we ignore the fact that big data surveillance (call it dataveillance if you want to coin a neologism) is fundamentally something new precisely because it does NOT rely on discrete individual instances of observation but quite literally creates new knowledge from data aggregation. [Some of the data scientists call their field “Knowledge Discovery” for precisely this reason.]

Third, I support the adoption of the mosaic theory because, as a practical matter, if it isn’t adopted then the likely alternative result is the rejection of the third party doctrine altogether. If (as seems plausible) the Court is concerned about the result in Carpenter and it chooses not to decide the case on a mosaic theory (long-term data collection v. short-term) the only alternative ground is to accept Orin’s suggestion and go for all or nothing. In other words, the Court would have to decide that, at least in this context, we DO have a reasonable expectation of privacy in data held by third parties and thus, as a result, the cloak of the warrant requirement is extended to data we originated but no longer control. But once we go down that road doctrinally, I don’t see where the analysis ends — there is nothing magical about data held by third party cellphone service providers that gives us more privacy expectations than, say, data held by third party banks or email providers or any other form of digital data. And yet third-party digital data (bank records, toll call records, email transaction data, travel records, etc.) is at the heart of almost all modern investigative techniques. Elimination of the third party doctrine wholesale for digital evidence would be a massive sea-change with untold consequences on investigative possibilities (and a significant disturbance of the equilibrium in favor of the individual). I can’t say for sure, but my instinct is that this cure might well be worse than the disease.

So much for my positive case for the mosaic theory — it’s reality and it’s better than the alternative. What about the complaint that the mosaic theory is difficult (if not impossible) to implement?

Color me skeptical. I certainly would not minimize that some line drawing will be necessary and some uncertainty will arise (especially in the transition time period when rules are ill-defined) but I do not think they are either insurmountable or incapable of resolution. Orin’s article posits three buckets of problems that would need to be address: duration and scale; which surveillance methods count; and the grouping problem (i.e., the aggregation of aggregated data). Let me address each in turn and at least outline what I think the right answers would be:

Duration and scale — some ask “how long is too long?” and “how much is too much?” in terms of data collection to cross the big data/mosaic threshold. They suggest that the indeterminacy of the answer to this question is a fatal flaw in the mosaic theory. But the law is no stranger to bright line rules and it is not a stranger either to lines of distinction that, at some level, are arbitrary. So the necessity for line-drawing isn’t the problem — it’s the question of exactly where to draw the line that troubles critics. But to that seems to me wrong — the science of predictive analytics is reasonably well established. We know, for example, that a single day’s worth of data on location has little predictive value. But give one enough data (currently on the order of two to three months’ worth) and analytic programs can identify a home or an office with a high degree of accuracy. [In the Jones case noted earlier, 28 days’ worth of geolocation data identified the stash house.] I think the law would be quite comfortable with a rule that said the warrant requirement turns on technological capability — and that requirement might change as technology changes. For now, I think one day is too short (which is an area where I think Carpenter is clearly wrong in his briefing) and three months is at the upper edge of the line drawing.
What surveillance counts? — Some contend that we don’t know what sorts of collections this new mosaic rule would apply to. Orin, for example, wonders about drones. Or police cameras. This one seems to me easy and I don’t understand why it is an issue — the rule would apply to any surveillance method that is capable of being used for long-term surveillance that accumulates data of sufficient volume to meet the duration scale, but of course if and only if the method is used for that long-term surveillance. A drone tracking location for one trip wouldn’t trigger the mosaic theory limitations. Floating a drone over me for three months, or dumping camera images of me for three months of tracking would.
Grouping — What about mixing data? Some drone imaging, some cameras, some cell towers, all mixed together to build the picture of me. Would that effort require a warrant (even if each of the individual collections might be short enough in duration to not require one)? I confess this question is harder, in part because it is difficult at this juncture to really consider how feasible this sort of data fusion really is. Part of me wants to punt on it, but in the spirit of a fair response, I would say … yes, if the intent is to create large-data aggregation, we should look at that irrespective of the data source. It is the collection, analysis and correlation that is legally relevant to me.

Those are, of course, very brief answers and I am prepared to defend them at greater length, but for now I hope they indicate that the existence of questions of implementation should not, at least in my view, be enough to throw the mosaic theory out the window willy-nilly.

What then of Mr. Carpenter? If we adopt the mosaic theory that I am advocating, I think he loses anyway. Historical cell site dumps from near prior robberies are not, at least not that I can see, designed to create a mosaic theory of his behavior. From this we cannot, for example, learn where his home is, or his office. Rather the dumps answer one question about Mr. Carpenter (and one question only): Was he in near proximity to all the robberies? The same question might as readily be answered by cameras inside each of the stores that were victimized. And asking and answering that question doesn’t implicate my conception of what the mosaic theory is intended to protect against — building a mosaic of an individual that isn’t possible through other types of short-term observation.

Carpenter’s problem, such as it is, is that he is really trying to say that the volume of the data dumps is offensive, not because it implicates him, but because it also reveals geolocation data about all of the other (thousands?) of innocents whose location is also now known. But there, too, if our focus through the mosaic theory is on affronts to individual privacy through the creation of new behavioral knowledge, the data collection doesn’t implicate that for other individuals either. Other individuals, are, by hypothesis, simply bystanders in this and have even less data collected than Carpenter does. By contrast, had the government sought three months of Carpenter’s geolocation data (either from his cell site data or the location function on his phone) that *would* implicate the mosaic concerns.

Will Carpenter win? I think so — that’s why the court took the case. But if he does it won’t be using a properly understood theory of mosaic knowledge discovery. In any event, I look forward to Orin telling me what I’ve got wrong here.