Friday, December 9, 2016

Big Data researchers call for IRB review, based on shaky premises

Jacob Metcalf of the Data & Society Research Institute and Kate Crawford of Microsoft Research, MIT Center for Civic Media, and New York University Information Law Institute (I think those are three different things) want to subject Big Data research to IRB review, at least in universities. Their argument rests on shaky premises.

[Jacob Metcalf and Kate Crawford, “Where Are Human Subjects in Big Data Research? The Emerging Ethics Divide,” Big Data & Society 3, no. 1 (January–June 2016): 1–14, doi:10.1177/2053951716650211.]

Assumptions about assumptions

Metcalf and Crawford understand that the current Common Rule does not require IRB review of publicly available datasets. Claiming to be “historicizing extant research ethics norms and regulations” and drawing lessons “from the history and implementation of human-subjects research protections,” they proceed to invent a history of the relevant provisions.

They write,

US research regulations (both the current rules and proposed revisions) exempt projects that make use of already existing, publicly available datasets on the assumption that they pose only minimal risks to the human subjects they document. (1)


The Common Rule assumes that data which is already publicly available cannot cause any further harm to an individual. (3)


The criteria for human-subjects protections depend on an unstated assumption that we argue is fundamentally problematic: that the risk to research subjects depends on what kind of data is obtained and how it is obtained, not what is done with the data after it is obtained. This assumption is based on the idea that data which is public poses no new risks for human subjects, and this claim is threaded throughout the NPRM. While this may have once been a reasonable principle, current data science methods make this a faulty assumption. (8. Italics in original.)

At no point do they cite any evidence that regulators excluded publicly available material from review out of the belief that it bore no risks.

Here’s what the regulators had to say when they released the 1981 regulations, which introduced the present definition of research:

Several commentators felt that the definition is too broad and should be restricted to biomedical research. These commentators felt that the definition should not encompass subjects not at risk, social science research, or historical research; and some preferred voluntary application of the regulations to behavioral research. In contrast, a few commentators suggested that the definition should encompass research which is so specific as not to yield generalizable results. One commentator argued that the definition violated the First Amendment or at least academic freedom in the area of biographic research …

HHS has reinserted the term “private” to modify “information.” This modification is intended to make it clear that the regulations are only applicable to research which involves intervention or interaction with an individual, or identifiable private information. Examples of what the Department means by “private information” are: (1) Information about behavior that occurs in a context in which an individual can resonably expect that no observation or recording is taking place, and (2) information which has been provided for specific purposes by an individual and which the individual can reasonably expect will not be made public. In order to constitute research involving human subjects, private information must be individually identifiable. It is expected that this definition exempts from the regulations nearly all library-based political, literary and historical research, as well as purely observational research in most public contexts, such as behavior on the streets or in crowds.

In addition to the definition of human subjects research (which does not include studies of public information), the 1981 regulations introduced the exemption for “Research involving the collection or study of existing data, documents, records, pathological specimens, or diagnostic specimens, if these sources are publicly available or if the information is recorded by the investigator in such a manner that subjects cannot be identified, directly or through identifiers linked to the subjects.”

HHS explained this decision as well:

HHS is concerned about preservation of the confidentiality of data pertaining to human subjects but feels that other federal, state, and local laws or regulations are sufficient to protect the privacy of individuals and the confidentiality of records in cases where the research uses only existing information. It remains the responsibility of the investigator as well as the institution to ensure that such laws and regulations are observed and that the rights of subjects are protected.

[Department of Health and Human Services, “Final Regulations Amending Basic HHS Policy for the Protection of Human Research Subjects,” Federal Register 46 (26 January 1981): 8336–8392]

In neither case did HHS assume that data research would be harmless. In defining research, regulators responded to researchers' concern about freedom (a word that does not appear in the Metcalf and Crawford article). They explicitly responded to critics who argued that researchers should be able to do library research without getting anyone’s permission. In crafting the exemption, regulators recognized a privacy risk but did not believe that IRBs were the correct solution to that problem.

IRBs are not the solution

And IRBs are not the solution. Metcalf and Crawford imagine wonderful things about IRBs.

Importantly, the ethics regulations targeted by critics, and the codes that informed those regulations, have played no small part in maintaining that trust over time. Insofar as physician–researchers contributed to the formation of those codes and regulations, and the broader research community assented to them (even if begrudgingly), research ethics regulations have built the bedrock of trust that has ultimately enabled research to occur at all. Therefore, even if the research/practice distinction as codified in the Common Rule proves too unwieldy for the methods of data science, we still need regulatory options that build trust between data practitioners and data subjects. (5)

The citation here is to Polonetsky, Tene and Jerome, “Beyond the common rule: Ethical structures for data research in non-academic settings,” Colorado Technology Law Journal 13 (2015). That article speculates about how new institutions might build trust, but it does not claim, much less show, that IRBs serve this function.

Most people haven’t heard of IRBs. IRB review has not calmed public controversy over studies like the Kennedy Krieger lead paint study or SUPPORT. And as Murray Dyck and Gary Allen have noted, “Mandatory multiple reviews of multisite research indicate that IRBs do not trust the merit and integrity of other IRBs.” If IRBs don’t trust each other, why should the public?

“The Common Rule needs to reflect that even anonymous, public data sets can produce harms depending on how they are used,” write Metcalf and Crawford. “The best way to do this in academic settings remains the IRB.” They offer no reasoning behind this claim.

Indeed they continue,

As for industry, there needs to be a more serious commitment to review and assessment of human data projects. Facebook, for example, responded to the public outcry about the emotional contagion experiment by setting up an internal review process for future experiments. Legal scholar Ryan Calo has argued that a body like the Federal Trade Commission could commission an interdisciplinary report on data ethics, and that those public principles could guide companies as they form small internal committees that review company practices. Polonensky et al. have similarly argued for a two-track ethics review model for use outside of the purview of the Common Rule that would blend internal and external perspectives. Dove et al. recently surveyed how research ethics committees have grappled with data-intensive research with ‘‘bottom-up’’ approaches when more traditional ‘‘top-down’’ approaches have fallen short. Others have also offered promising insights for integrating ethical reasoning into data science research and practice prior to the typical timing of formal ethical review.

Any one of those approaches, especially the last two, sound better than the current IRB system, which empowers pseudo-experts to arbitrarily block research they do not understand. So why single out university-based data researchers for the misery of a broken system they have so far escaped?

No comments: