Monday, June 30, 2008

The Psychologist Who Would Be Journalist

Back in August 2007, I mentioned the controversy surrounding the book The Man Who Would be Queen (Washington: Joseph Henry Press, 2003) by J. Michael Bailey, Professor of Psychology, Northwestern University. At the time, Professor Alice Domurat Dreger, also of Northwestern, had just posted a draft article on the controversy. Now that article, along with twenty-three commentaries and a reply from Dreger, has appeared in the June 2008 issue of the Archives of Sexual Behavior.

Dreger's article, the commentaries, and Dreger's response focus on big questions about the nature of transsexuality, the definitions of science, power relationships in research, and the ground rules of scholarly debates. Only a handful take up the smaller question of whether—as a matter of law and as a matter of ethics--Bailey should have sought IRB approval prior to writing his book. But that's the question that falls within the scope of this blog.

Bailey's book is based, in part, on his knowledge of the life stories of several transsexual women. In July 2003, four of those women filed formal complaints with Northwestern University's' Office of the Vice President for Research. They were Anjelica Kieltyka, a transsexual activist who had introduced Bailey to transsexual women seeking sex reassignment surgery, and three anonymous complainants who had sought Bailey's endorsement of the surgery. Also in July 2003, two prominent scholars, Deirdre McCloskey and Lynn Conway (both transsexuals) filed their own complaint in support of the four women, alleging "misuse of human subjects" among other charges.

In their complaints, the women note two sets of interactions with Bailey. First, Bailey had interviewed the three anonymous women prior to writing letters of support for their requests for sexual reassignment surgery (SRS). According to Dreger, the interviews allowed Bailey to write letters "reporting simply what he observed in terms of a pre-op transsexual woman’s gender identity presentation, her apparent understanding of the surgery, and her likelihood of adjusting well after SRS." (372) It's not clear from any of the documents what questions he asked or what answers he got, and the book itself does not mention the interviews.

The second set of interactions came as a result of Bailey's invitation to Kieltyka and at least two other complainants to act as guest lecturers in his classes on human sexuality. As Dreger notes, these were "heavily attended," and "between 1994 and 2003, a total of several thousand Northwestern University students saw Kieltyka’s annual appearances." (373) The complainants, including McCloskey and Conway, suggest that these lectures were themselves abusive. As McCloskey and Conway put it, "The women [who] have come forward . . . fear that the hundreds of Northwestern undergraduates before whom they were paraded in the on-going freak shows in Professor Bailey's classes might recognize them."

These latter are unusual charges for an IRB complaint. Does anyone familiar with IRB regulations argue that inviting someone to lecture to one's course constitutes human subjects research?

More plausible is the charge that the pre-surgery interviews should have triggered IRB review. While Bailey likely began writing reference letters in 1996, prior to beginning his work on the book in 1997, at least two interviews took place in 1998 and 2000. Chronologically, it's possible that he interviewed these women knowing he might use their stories in his book, but not telling them.

But did he? On page 177, Bailey writes, "most of the homosexual transsexuals I have met, I met through Cher", his pseudonym for Kieltyka. The complainants take this as a reference to them, and as evidence that he used material from the pre-surgery interviews in his book. But there is another explanation of that line. Dreger notes that Kieltyka also "encouraged Bailey to accompany her to the local bars frequented by pre- and post-op transsexual women and drag queens where Kieltyka was familiar with many of the regulars." (372) And on page 181 of his book, Bailey makes it clear that these bars were the sites of his first encounters with at least some of the women on whom he based his writings--women whom he interviewed in IRB-approved, laboratory settings. Thus, the reference on page 177 could well be to Kieltyka's help in recruiting subjects for Bailey's IRB-approved studies. (Dreger, 377) There's no reason to think that Bailey used any information from the pre-surgery interviews in his writings, making it hard to label them as unauthorized research.

Dreger notes another set of interactions, not mentioned in the complaints:


The information about individuals that Bailey gathered for the book from Kieltyka, Juanita, Braverman, and others he obtained haphazardly—without any developed plan of research—from their occasional presentations to his classes, from their joint social outings, and from one-on-one discussions that occurred on an irregular basis. Bailey did conduct a few fill-in-the-blank discussions with Kieltyka, Juanita, and others (Bailey to Dreger, p.e.c., August 22, 2006)—discussions during which, as I show below, they knew he was writing about them in his book, and with which they cooperated. But these fill-in-the-blank discussions can again hardly be called systematic or productive of generalizable knowledge. When I pressed him to consult or perhaps even turn over to me the notes he took from these conversations, Bailey admitted he had no organized notes that he had bothered to keep. Obviously, he never really thought of these discussions as research—systematic work meant to be productive of generalizable knowledge—any more than he ever imagined that the women who seemed eager to tell their stories and have him write about them might later charge him with abuse. Otherwise, he surely would have protected himself and his work by being significantly more organized.


Dreger agrees with Bailey that his work was neither systematic nor generalizable, and therefore not subject to IRB review.

Of the commentators in the journal who take on the human-subjects angle, most recognize the flimsiness of the human-subjects case against Bailey. Brian A. Gladue, of the University of North Texas Health Science Center's Office for the Protection of Human Subjects, writes:

the Northwestern IRB would have determined that Bailey’s book project did not need IRB review, and Bailey was correct, both ethically and by regulations, in not seeking or obtaining IRB review. Simply stated, he did not need it—any more than journalism students need IRB review for class projects, or history faculty need IRB review to ask people questions about growing up in their hometowns, or interviewing war veterans about their experiences, etc. Frankly, IRBs generally are busy enough and do not need the extra business and burden of evaluating minimal risk human interactions that are not in and of themselves scientific research.

He goes on to warn about the continued expansion of IRB jurisdiction.

(Gladue also claims that "it is hugely ironic that social activists and social scientists/life historians would even argue that Bailey should have obtained IRB review for his book. For years, these groups of scholars and academics have chafed under the regulatory burden of IRB reviews." (448) As Dreger notes in her response, two of Bailey's three main antagonists are not social scientists/life historians of any stripe. (507) Gladue has a better case against McCloskey, whom I doubt got IRB approval for her memoir--based in part on conversations with other people, and published by a university press.)

Elroi Windsor concurs with Dreger's conclusion that "as an unscientific work that lacked systematic inquiry, [Bailey's book] did not qualify as human subjects research and therefore Bailey did not violate research standards." (495) Likewise, Seth Roberts refers to the McCloskey-Conway effort as "an absurd human-subjects complaint." (485) He elaborates (quoting his correspondence with McCloskey):


Never before in the history of science had the subject of a story told to illustrate a point been thereby considered a research subject. Bailey’s book is not a scientific monograph. It is not a piece of science. It is a trade book about science. When I or anyone else gives a lecture about a scientific subject, and tell a story from everyday life to make the conclusions come alive, do we need informed consent from everyone mentioned in the story? Of course not. No one has ever been required to do this. No one has ever done this. No one has ever even conceived of such a thing.


Marta Meana's commentary faults McCloskey and Conway for trying to use Northwestern's IRB as a makeshift censorship board. She describes the ethical complaints as "completely off-topic and simply an attempt to inflict as much damage as possible." (471) Indeed, Bailey's critics accused him of everything from practicing clinical psychology without a license to "plagiarism and identity theft."

Of all the commentaries, only two argue that Bailey's work should have been subject to IRB review. Richard Green writes,

I take exception to the Dreger article characterization of research as the systematic investigation, including research development, testing, and evaluation, designed to contribute to generalizable knowledge, and only then subject to protection of human subjects. A scholarly studymay differ from a scientific one welded to that definition but still impact its subjects. Stoller’s (1973) epic ‘‘Splitting: A Case of Female Masculinity’’ was a 395 page case study of a woman convinced that she had a penis. It was seven years of interview transcripts. It was not generalizable. There was no hypothesis testing. But his subject required (and received) protection. (452)


While I don't know the details of Stoller's work, I do know that the current definition of human subjects research was adopted in 1981, so the treatment of a 1970s project is quite irrelevant to the interpretation of current rules. Green's essay may be an unintentional plea for a return to the pre-1981 regulations.

The other case for inclusion comes from sociologist John H. Gagnon:


Bailey’s usual scientific work has been with subjects in experiments or in surveys and in these studies he has (here I am supposing, I have not asked) submitted his research plans to his IRB on the main campus at Northwestern and provided consent forms to his (and his colleagues’) subjects. His contacts with transgendered persons were (if I may infer), to his mind, more casual and less scientific than his other work. (447)


This passage shows that Gagnon did not read Dreger's article very carefully. Dreger makes clear (377) that Bailey did get IRB approval for his more systematic studies, and that those IRB-approved studies included transsexuals. Maybe Gagnon's claim of un-reviewed "field work" refers to the "fill-in-the-blank discussions" mentioned by Dreger. But that's just a guess.

For good measure, Gagnon argues that Dreger herself should have faced IRB review for the interviews and correspondence she used in writing her article. Dreger did, in fact, consult the Northwestern University Office for the Protection of Research Subjects, which assured her that her work was "not IRB-qualified." (401) Gagnon tries to explain this away by claiming Dreger "was exempted from human subjects review by the IRB at Northwestern University’s medical school despite the fact that she was interviewing people whom I would treat as 'human subjects.' I am not sure how the IRB on the main campus of Northwestern, which is far more familiar with social science research, would have dealt with Dreger’s submission." (447) Again, Gagnon is a sloppy reader. Dreger consulted not with a medical-school IRB, but with Eileen Yates, an official with the university office that oversees both medical and non-medical research.

Gagnon writes that IRBs "are often (perhaps more often than not) excessively intrusive, legalistic, and ignorant of the methods and traditions of the disciplines which they review. However, they are part of the apparatus of managing ethical dilemmas in human science in the current political and economic atmosphere that surrounds the production of knowledge by academic researchers. The decision to define either Bailey’s or Dreger’s works as nonscience may be tactically useful in this case, but in my view, neither choice is the correct one." (447)

I don't know what Gagnon means by the "current political and economic atmosphere." Does he mean that we depend on federal money, so we'd better shut up? Is there a difference between a "political atmosphere" and federal law as enacted by Congress? In a different political and economic atmosphere, would Bailey's actions be ethical?

I do know that his essay makes no distinctions between what is and is not within IRB purview, and he offers no counterexamples of scholarly works that might not require review. As best I can tell, he thinks that any published writing by a scholar who has talked with other people requires IRB review. That's a pretty extreme position.

Significantly, Gagnon makes the case for IRB review only on procedural grounds: "both [Bailey's book] and Dreger’s comment are works which fall into recognizable genres of scientific writing and both are dressed in scientific costume,." he writes. "Both employ methods that bring them under the rules and regulations of the appropriate Institutional Review Boards about informing human subjects that they have become ‘'data.'" (447) He does not claim that IRB review would have prevented or resolved the conflict between Bailey and his critics. Would it?

Certainly, an IRB might have insisted on written consent from some of Dreger's sources, notably Kieltyka and the pseudonymous "Juanita," whose stories each take up several pages in the book. As Dreger notes in her article, both women seem to have been aware that Bailey was writing about them and gave oral consent, but later claimed that they had not known of the book. A paper trail would have been good for all concerned.

But it's clear from the complaints that Bailey's failure to secure written consent was hardly the issue that sparked real anger. The damage they allege is not to the individual participants, but to the transsexual community as a whole. They specifically complain that Bailey’s students may get the wrong idea about transsexuals. This is a harm, but IRBs are not designed to protect communities against this kind of damage. As the National Commission put it in its Institutional Review Board; Report and Recommendations of National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research:


In evaluating risks and benefits to subjects, an IRB should consider only those risks and benefits that may result from the conduct of the research. . . . The possible longrange effects of applying knowledge gained in the research (e.g., the possible effects of the research on public policy affecting a segment of the population) should not be considered as among those research risks falling within the purview of the IRB . . . .


Bailey's critics, including some women whose experiences he drew on for his work, think he wrote a stupid, offensive book that will poison readers' ideas about transsexuals. They may be right. But IRBs cannot protect people from every kind of harm without stifling legitimate research, and universities accept that some of the ideas put forth by their researchers will hurt people.

Sunday, June 29, 2008

Oral Historians Draw Conclusions, Inform Policy, and Generalize Findings

In the lead story of today's New York Times ("Occupation Plan for Iraq Faulted in Army History"), Michael R. Gordon reports on a new 700-page official history of the early occupation of Iraq, produed by the Army’s Combined Arms Center at Fort Leavenworth. As Gordon reports, "the study is based on 200 interviews conducted by military historians and includes long quotations from active or recently retired officers." He notes that "the study is an attempt by the Army to tell the story of one of the most contentious periods in its history to military experts — and to itself." It draws important conclusions with policy implications, finding, for example, that "the military means employed were sufficient to destroy the Saddam regime; they were not sufficient to replace it with the type of nation-state the United States wished to see in its place.”

This sounds suspiciously like the kind of project comprising generalizable research as defined by OHRP's Michael Carome in his October 2003 discussion with the UCLA Office for Protection of Research Subjects (as reported by UCLA.) In that conversation, Carome noted that


Systematic investigations involving open-ended interviews that are designed to develop or contribute to generalizable knowledge (e.g., designed to draw conclusions, inform policy, or generalize findings) WOULD constitute "research" as defined by HHS regulations at 45 CFR 46.

[Example]: An open ended interview of surviving Gulf War veterans to document their experiences and to draw conclusions about their experiences, inform policy, or generalize findings.


Except for the fact that it's the wrong Gulf War, the Army study nicely fits Carome's example of research requiring review.

Fortunately for federal historians, no one else in the federal government seems to share Carome's view on this matter. I know of no federal agency, executive or legislative, that requires IRB review for oral histories conducted by its employees. As reported on this blog, even OHRP officials did not submit to IRB review when conducting oral history research.

Maybe Dr. Carome will try to discipline the researchers at Fort Leavenworth. Him and what army?

Friday, June 13, 2008

IRB Disciplines and Punishes a Qualitative Researcher

Tara Star Johnson reports her experiences in "Qualitative Research in Question: A Narrative of Disciplinary Power With/in the IRB," Qualitative Inquiry 14 (March 2008): 212-232.

Johnson left teaching high school to pursue a PhD in Language Education at the University of Georgia. As she completed her preparatory work, she found "no qualitative studies investigating the phenomenon of sexual dynamics in the classroom." She decided, for her dissertation work, "to address this void in educational research through in-depth interviewing of teachers who have experienced desire for and/or from students to trace how these attractions happen and open the door for dialogue about embodiment, desire, and sexuality in education." Her professors were encouraging, and her advisor accompanied her to her appointment with the IRB.

After waiting an hour and a half beyond their scheduled appointment, Johnson and her advisor finally met with about twenty members of the IRB. The chair listed several restrictions, which Johnson found disappointing, but "not unreasonable or completely unexpected." Then the fun began.

One member found fault with Johnson's proposal to speak in depth with five high school teachers whom she already knew; wouldn't it be better to conduct an anonymous survey with ninety or so teachers? No, Johnson explained, it would not. "I'm looking for dialogue here, in-depth experiences of a few participants, not a bunch of Likert-scale responses."

Another member fretted, "Let's say, 10 years down the road, someone's having a party. One of your colleagues is there and happens to strike up a conversation with one of your research subjects. Your name comes up, and your subject says, 'Oh, I know her! I was in her dissertation study.' Your colleague would immediately be able to identify her." Well, yes. Participants in research are always free to identify themselves. Unless the IRB requires that they be shot.

Then the board rejected Johnson's plan to obtain an NIH certificate of confidentiality to protect the identities of any teachers who disclosed misdeeds, though such certificates are often trumpeted in IRB circles as the kind of thing a good board will suggest.

And, predictably, Johnson was asked "to come up with a list of counseling referrals in case my participants were traumatized by my research." (If anyone needed trauma counseling, it was the researcher whose work was reviewed before Johnson. She fled the IRB meeting in tears.)

The meeting ended with the IRB chair's informing Johnson that she would send a list of required changes, and then the project would be considered at the next meeting, six weeks later. When the list arrived, Johnson was particularly distressed that she would not be allowed to record or transcribe the interviews that she hoped would be the basis of her dissertation. So Johnson dug in her heels, and kept recording and transcription in her revised proposal.

So here's the punch line: on the next round, the board voted to send the application off for expedited review, and approval itself came the next business day.

To some degree, the change in the board's position from the first round to the second reflects Johnson's abandonment of some of the most interesting parts of her study design. In particular, she had to reword her recruitment flyer to screen out any teachers who had actually had sex with their students. Apparently, there are some subjects that University of Georgia scholars may not study under any circumstances.

But many of the components that the board originally objected to remained in the final proposal. She would still interview a small number of teachers. She would still ask them about sexual feelings. And yes, they might still be allowed to attend parties ten years down the road. Does this make Johnson's research dangerous or not? If no, the board had no business, in its first meeting, bullying her about her plans. If yes, then its granting of expedited review (valid only for research involving "no more than minimal risk") was a violation of 45 CFR 46.110. In short, the IRB's behavior cannot be explained by an effort to protect participants in research while adhering to federal regulations.

What does explain such behavior?

Johnson suggests that "the real issue was not protecting participants so much as protecting the university from potential lawsuits and bad publicity." This is quite plausible; to see what can happen when reporters find out that a scholar is studying teacher-student sex, read Pat Sikes's essay, "At the Eye of the Storm: An Academic('s) Experience of Moral Panic," in the same issue of Qualitative Inquiry. If a university wants to keep researchers away from controversial topics, an IRB is a good tool.

But Johnson offers another explanation as well: quantitative researchers' contempt for qualitative work, like her interviewing. In this perspective, the IRB was not so much worried about protecting the university as they were in "disciplining my department in a Foucauldian sense for allowing its students to do research that was out of line," where "out of line" meant qualitative.

Both explanations recall Stefan Timmermans, "Cui Bono? Institutional Review Board Ethics and Ethnographic Research," Studies in Symbolic Interaction 19 (1995): 153-173. Like Johnson, Timmermans had his project approved, but only after being berated by an IRB in what he termed "a Goffmanian public degradation ceremony." Part of the problem was that the board members seemed to fear that his work would reflect badly on the hospital he was studying. And part was that they despised his ethnographic approach. A board member shouted at him, "The numbers should come through in the paper. This is not systematic. What about statistics! . . . If you write something, we should know HOW MANY PEOPLE said WHAT, there should be NUMBERS in here. There is NO DATA in this paper."

(At that point, Timmermans might have pointed out that if his work wasn't systematic, it wasn't subject to IRB review under the Common Rule. But I can see why he refrained.)

Johnson begs her readers to "read to the end before making any judgments about the people I portray." In her conclusion, she notes that individual members of IRBs face their own constraints. Outnumbered by scholars in other disciplines, the qualitative researchers most sympathetic to Johnson's work may have been unable to defend her too vocally, choosing instead to maneuver her subtly toward approval. That is an imaginative and generous supposition, but it still leaves us with an IRB that abuses its authority.

Tuesday, June 3, 2008

Music Educator Finds IRBs Inconsistent, Restrictive, and Burdensome

Rhoda Bernard kindly alerted me to Linda C. Thornton, "The Role of IRBs in Music Education Research," in Linda K. Thompson and Mark Robin Campbell, eds., Diverse Methodologies in the Study of Music Teaching and Learning (Charlotte, North Carolina: Information Age, 2008), 201-214.

Thornton (along with co-author Martin Bergee) wanted to survey music education majors at the 26 top university programs to ask why they had chosen music education as a profession. She writes, "no personal information regarding race, habits, or preferences was being collected—only descriptive data such as each student's major instrument (saxophone, voice, etc.), age, and anticipated year of graduation." She dutifully submitted her proposal to her local IRB, and then the trouble began.

Thornton's own IRB forbade the researchers from surveying students at their own institutions, then imposed requirements suitable for a survey on sexuality or criminal activity. Most significantly, it required Thornton to seek permission from the IRBs at the 24 universities remaining in her pool.

Nine of the 24 accepted the proposal as approved by Thornton's IRB, including one which noted it had a reciprocity agreement in place. Of the remaining 15, several imposed burdensome requirements, ranging from small changes in the informed consent letter (which then needed to re-approved by the original IRB), and the requirement that the instructor at the local institution, who was just going to distribute and collect questionnaires, be certified in human subjects research. Application forms ranged from two pages to eight; at least one IRB demanded to know the exact number of music education majors in every school to be surveyed. The result was that the researchers dropped many of the schools they hoped to study, cutting their sample from several thousand to 250.

This sad story touches on two points: inconsistency, and regulatory exemptions.

Since their creation in the 1960s, IRBs have been making decisions based on guesswork, with little attempt at developing a consistent system of best practices for research and for the review of research. As Jay Katz testified in 1973,


The review committees work in isolation from one another, and no mechanisms have been established for disseminating whatever knowledge is gained from their individual experiences. Thus, each committee is condemned to repeat the process of finding its own answers. This is not only an overwhelming, unnecessary and unproductive assignment, but also one which most review committees are neither prepared nor willing to assume.

[U.S. Senate, Quality of Health Care—Human Experimentation, 1973: Hearings before the Subcommittee on Health of the Committee on Labor and Public Welfare, Part 3 (93d Cong., 1st sess., 1973), 1050].


Katz's testimony helped inspire Congress to pass the National Research Act of 1974, requiring broader use of IRBs and establishing the National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research to make further recommendations--recommendations that remain the basis of today's system in the United States and elsewhere. But neither the law nor the commission addressed the problem of disseminating knowledge. In 1982, Jerry Goldman and Martin D. Katz tested the system by submitting identical, flawed proposals for medical research to 32 IRBs. They found "substantial inconsistency in the application of ethical, methodological, and informed-consent standards for individual review boards." (Jerry Goldman and Martin D. Katz, ''Inconsistency and Institutional Review Boards,'' Journal of the American Medical Association 248, pages 197-202, 1982). Thornton did not set out to replicate the Goldman-Katz test (she wanted to learn about music students, not IRBs), but she did so by accident, and got similar results.

Perhaps no one will be surprised that Thornton's sample showed so much inconsistency. Indeed, even Laura Stark, something of a defender of the present system, encourages us to think of inconsistency as a feature, not a bug. "The local character of board review does not mean that IRB decisions are wrong so much as that they are idiosyncratic," she writes, suggesting that "the application of rules is always an act of interpretation and that sometimes this discretion can have positive, as well as negative, effects." (Laura Stark, "Victims in Our Own Minds? IRBs in Myth and Practice," Law & Society Review 41 (December 2007), 782. Perhaps, but I suspect the negative effects are far more common. I doubt Stark would defend the treatment Thornton received.

I was more surprised by the one way the 24 IRBs were consistent in their response. Federal regulations offer exemptions for


(2) Research involving the use of educational tests (cognitive, diagnostic, aptitude, achievement), survey procedures, interview procedures or observation of public behavior, unless:
(i) information obtained is recorded in such a manner that human subjects can be identified, directly or through identifiers linked to the subjects; and (ii) any disclosure of the human subjects' responses outside the research could reasonably place the subjects at risk of criminal or civil liability or be damaging to the subjects' financial standing, employability, or reputation. (45 CFR 46.101)


Thornton's research clearly fits this exemption.

Universities may apply their own rules on top of federal regulations, and we know from a 1998 study that less than 40 percent of survey research eligible for exemption actually receives it. [James Bell, John Whiton and Sharon Connelly, Evaluation of NIH Implementation of Section 491 of the Public Health Service Act, Mandating a Program of Protection for Research Subjects (Arlington, Virginia: James Bell Associates, 1998), 29.] Still, I would have expected some IRBs to tell Thornton that her research required no review. No such luck. Thornton's own institution requires review of all research involving college students, and all or almost all of the other universities seem to have applied similar, non-federal rules.

Last year, Jerry Menikoff argued that social scientists had exaggerated the dangers of IRBs. He claimed that "most institutions assume that any study which falls within one of the exemption categories would automatically be in compliance with the Belmont Report criteria," and therefore "such studies, in a properly functioning IRB system, should receive relatively rapid and nonburdensome review." ("Where’s the Law? Uncovering The Truth About IRBs and Censorship," Northwestern University Law Review 101 (2007), 794-795). Maybe they should receive such review, but Thorton's experience suggests they do not. If research universities that excel in music education are any indicator, overly restrictive IRBs are the rule and not (as Menikoff suggests) the exception.

The exemptions in 45 CFR 46 resemble the bill of rights in the old Soviet constitution. They may look good on paper, but don't count on them to protect the freedom of inquiry.