Monday, January 21, 2008

How IRBs Decide--Badly: A Comment on Laura Stark's "Morality in Science"

Laura Stark's recent essay in Law & Society Review led me to her 2006 Princeton University dissertation, "Morality in Science: How Research Is Evaluated in the Age of Human Subjects Regulation." The heart of the dissertation is her description of the workings of three university IRBs--one in a medical school and two at universities without medical schools--based on recordings of their meetings and her direct observation of the IRBs at work. It makes for fascinating reading, and I applaud Stark for her achievement even as I disagree with her conclusions.

Stark claims to be neutral about IRBs' ability to perform their stated task: protecting the subjects of research. She writes, "My goal is not to judge the 'fairness' and 'effectiveness' of IRBs myself." (7) And she correctly notes that the ethical acceptability of an IRB-approved project is a "social truth," not an empirical one. (244) But her tone is generally sympathetic to the IRBs. For example, she writes that IRBs' "forms of evaluation provide directed, tangible ways for board members to carrying out their review process, given the practical difficulty of applying unmediated, abstract ethics principles," making the IRB members sound like heroes who have achieved a workable system against the odds. (186)

Indeed, in some cases she reports, IRBs seem to be doing some good. For example, a physiologist and a nurse had a fruitful debate about the need for a quick medical screening of subjects in an exercise study (197-200). That's an example of an IRB with multiple experts on a single type of research--something I hope is reasonably common in much biomedical research. But most of Stark's observations are distressing in ways I don't think she appreciates. Here, then, are some of the actions she observed, along with my reasons for finding in them an indictment of the IRB system as presently run.


At all three IRBs she observed, Stark saw members judging proposals based on the proportion of spelling and typographical errors in the proposal. She calls such behavior "housekeeping" and excuses it on the grounds that it "was indispensable for IRBs because the apparent degree of care taken in submitting a tidy proposal served as a proxy for an investigator's ability, allowing board members to make judgments about people's reliability as researchers. In this way, ink and paper serve as character witnesses." (173)

Such behavior, I believe, represents what Sir James George Frazer called the practice of "homeopathic magic." As Frazer put it in the Golden Bough, "the magician infers that he can produce any effect he desires merely by imitating it." In this case, the proposal serves as a magic charm, and a tidy proposal guarantees an ethical research project. That IRBs would resort to such practices is strong evidence that they lack the expertise to judge proposals on their merits.


Stark calls a second kind of evaluation "scientific evaluation," which sounds nice. But her observations confirm the complaints of many researchers that IRB members demand changes in research they do not understand.

For example, when a social scientist sought permission to interview survivors of domestic violence who became community activists, a lay member considered insisting that the researcher also involve women who were not activists, or who remained in abusive relationships. Stark notes that "the investigator explained that she had not intended to do a comparative study and that the idea of constructing a control group and making her hypothesis explicit were antithetical to Grounded Theory," and that a statistician on the IRB rose the the researcher's defense (205). The statistician admirably stated his intention to remain silent when he lacks expertise, but it's clear that not everyone on the IRB possesses such self-restraint. In another case, a board demanded that an investigator justify decisions that, Stark notes, "did not bear directly on protection of human subjects." (179) The investgator became flustered, and almost agreed to changes that would have invalidated the findings, until another board member intervened and shut up his colleague, while the rest of the board laughed.

Stark reports such events gently, stating, "criticism of the quality of an investigator's science often came from expert members who did not belong to the same discipline as the [investigator]," leaving it up to the investigator's disciplinary colleagues to defend the proposal. (180) She writes, "license to evaluate the scientific merit of studies extended beyond what IRB members could justify in the name of research subjects. Science evaluation as a human subjects issue was at times self-consciously melded with criticisms made for the sake of better science." (181) But in her own examples, members' suggestions were at least as likely to degrade the science as improve it.


Stark finds that IRB members who aim for "subject advocacy" rely on their imaginations, rather than empirical evidence about the effects of various types of research. She offers this story:

"an IRB chair (at a board where I did not observe) described to me an episode involving a mental health worker who served as her board's community representative. An investigator had proposed a study on homeless people, which the community representative resisted because she felt that the population was too easily exploited. Her resistance to this research, according to the chair, was symptomatic of a broader problem in which the community representative would 'overly identify with patients and overestimate risks, and not really attend to data to balance her perspective.' In this instance, the community representative 'was not willing to hear the scientific data,' which indicated that interviewing homeless people did not harm them and that in fact interviewing them provided useful data that might aid the group. The community representative instead argued that 'the mere act of interviewing them was putting these folks as risk.' Thus, to this IRB chair, 'critical thinking and stepping back, taking a little distance from an issue, and trying to look at it in an objective fashion just wasn't something [the community representative] was willing to do…Since the committee doesn't function by consensus, we just moved ahead, and I'm certain she felt sidelined." (183)

It's nice to hear that this member was overruled, but I still pity the researchers who must present their proposals to her. Overall, Stark paints a particularly grim picture of the role of lay members on IRBs. It seems that they only get the attention of the researchers on the board once they've been so co-opted that their comments are indistinguishable from those of the "scientist" members.


Federal regulations (45 CFR 46.111) require IRBs to determine that "selection of subjects is equitable." While I don't think this is a wise criterion for judging research in the social sciences and the humanities, I can see its importance for medical research. Unfortunately, establishing an equitable selection of subjects is very difficult. In the case Stark observed, the IRB simply encouraged the researcher to lie about his intentions:

"The investigator indicated that it would be difficult in practice to use subjects from the [predominantly minority] location he had just mentioned because of the logistics of the study. Then, Reverend Quinn joined the discussion. Together, they clarified for the investigator what the board was looking for—and why:

'Reverend Quinn: Actually, even by adding the phrase after "efforts will be made to recruit from senior and community centers throughout the state," "including those that serve areas of minority populations." Just something like that, would simply make it clear you are being more proactive that otherwise people would think you were.

'Olin: And that would suggest, too, that we and you are being more vigilant.'" (185)

The language proposed by the IRB would indeed suggest that the researcher would be more vigilant, even when the researcher had little intention of recruiting minority subjects. A better strategy would have been to allow the researcher to honestly state his intentions and proceed with the research as planned, then seek the resources needed to do the difficult work of including minorities.


Most of Stark's stories concern petty IRB interventions. There's some tinkering with a consent form (182), and squabbling over whether a phone call asking parents to participate in a follow-up study constitutes "invasion of privacy." (233) This sort of effort to avoid hurt feelings is far removed from the kinds of permanent harms against which federal regulations are meant to protect.

In one case Stark observed, an IRB intervened more seriously, perhaps killing a study. An investigator wanted to ask parents how they disciplined their children and was reluctant to report suspected child abuse to state authorities lest the prospect of such reporting lead parents to lie. An IRB member who knew a child who had been killed by abuse spoke up, and this led the board to refer the project to the university lawyers. So far so good; regardless of the personal experiences of the board members, an investigator in this position should know and follow the applicable laws about reporting child abuse. But, Stark continues, once the referral was made, "after several months with no decision, the investigator abandoned her research plan and withdrew the project from consideration." (211) She doesn't explain who delayed the project--the IRB or the university counsel--but the upshot is that rather than improve a potentially important research project, the process killed it. Moreover, because technically the proposal was withdrawn rather than rejected, the IRB can continue claiming a low rejection rate.


Stark acknowledges that when offered identical proposals to review, IRBs will respond with wild variation. She asked eighteen IRB chairs how they would react to a proposal to test job discrimination by sending black and white applicants to apply for the same job, then waiting to see who would get a call back. She reports that "the chairs diverged dramatically on both the problems that they identified and the modifications that they requested. Because of their distinctive local precedents, IRB chairs’ had dissonant ideas about what risks the standard protocol entailed, to whom, and with what severity, which guided them towards distinctive decisions about whether consent could be waived, whether investigators could get consent without invalidating their data, and whether debriefing should be mandatory or prohibited as a source of harm in its own right." (231)

Stark defends such variation as evidence of "local precedents" that allow predictability within an institution (239). She compares such decision-making to the "pragmatic, not experimental, tradition that was developed in Anglo-American law and medicine during the late nineteenth century." (240) But the best doctors and lawyers of that century shared their knowledge broadly, and read broadly too. Stark offers no examples of an IRB member calling in an outside expert or doing some independent reading.

Stark argues that "the highest priority for IRBs is consistency--not with other IRBs, but with their own prior decisions." (4) She leaves this finding without further comment, neither praising nor condemning the IRBs. But she takes her relativism too far. An IRB that bases its decisions on spelling errors is consistent, predictable, and irrelevant. An IRB that always requires oral historians to destroy their recordings is consistent, predictable--and wrong. And the highest priority of IRBs should be the protection of participants in research. If they are unwilling to shoulder that responsibility, they should disband.

Stark briefly hints at an awareness of this problem when she notes that "IRBs would make more similar judgments if local boards shared decision-making precedents at a national level. Given that IRBs make decisions based on cases, the challenge for a coordinated review system is not to craft more detailed federal regulations, but to train IRB members with a limited set of nationally-shared cases on which boards can base their decisions, as an alternative to local precedents." (243) That sounds like an interesting proposal, but it would require a massive overhaul of the current regime, from OHRP down to local boards. As it stands, Stark has given us a close look at a broken system.

No comments: