By Gary Evans

The unprecedented level of digital data available across an expanding electronic landscape poses complex challenges for IRBs as they attempt to provide ethical insight and ensure participant privacy.

The Office for Human Research Protections (OHRP) recently held a workshop on this issue, entitled “Privacy and Health Research in a Data-Driven World.”

“We are currently living in a world in which previously unimaginable amounts of data are being generated and used for research purposes,” said Jerry Menikoff, MD, JD, director of OHRP. “There is a tremendous potential to use these vast collections of data to learn more about health behavior of the population to the benefit of all of us as individuals.”

Some of these data are collected in clinical care, but the public also is generating data through health monitoring devices, GPS location systems, social media, and information collected and shared on mobile apps.

“Health-related big data research promises new insights into treatments for a variety of conditions and diseases, in addition to innovative ways to support and maintain good health,” he said. “However, currently, there is no consensus about appropriate ways to collect, store, and share these types of data. What are the appropriate trade-offs?”

To dig into this question, Michael Zimmer, PhD, associate professor in the department of computer science at Marquette University, presented research on what he called “pervasive data.” For the purposes of his ongoing research, Zimmer defined pervasive data as “rich, personal information generated through digital interaction and available for computational analysis.”

In general, Zimmer said, pervasive data research:

  • gathers digital data about people;
  • uses computational methods to assess an individuals’ or groups’ health, habits, routines, or beliefs;
  • may be collected frequently without the studied populations’ knowledge.

IRB Challenge

Challenges for IRBs in dealing with pervasive data in research include the concerns with privacy, informed consent, and harm to human subjects.

“A lot of what we are seeing happening today in this space of big-data ethics and research is creating confusion,” Zimmer said. “[There are] some gaps and shifts in understanding of fundamental core principles around research ethics.”

Other issues for IRBs include the increasing ease of performing big data research, and the fact that some researchers in this emerging area may lack traditional ethics training, he said.

“We are seeing a lot of scientists and researchers in research communities who don’t have a tradition of dealing with human subjects as defined by the regulations,” he said. “I’m working now with computer scientists and data scientists who don’t have a long history of dealing with IRBs, or understanding there is actually a human being attached to this piece of data they are analyzing.”

While that insight suggests caution, pervasive data often are easy to access and disseminate. “If I ask one of my undergrads to give me 5 million tweets about something, they can have them for me before I get home tonight,’ Zimmer said. “It’s a very different kind of research environment than 20 years ago.”

To explore these questions, Zimmer and colleagues have formed a research website called “Pervade: Pervasive data ethics for computational research,” available at: “We are trying to understand what IRBs think about pervasive data, and how they actually review these protocols,” he said. “Are they adequately prepared to manage these kinds of research projects?”

Researchers of a 2017 study of 59 IRBs in the U.S. found that 93% of respondents reported that “online data” raise research ethics issues. However, only 55% said they believed their IRBs were well versed in the technical aspects of online data collection, Zimmer said. Only 57% believed their IRB had the expertise to stay abreast of changes in online technology, he added.1

IRB Survey

Earlier this year, Zimmer and colleagues recruited IRB members via email and social media to assess knowledge and attitudes on pervasive research. They received 79 valid responses, 80% of which were from a college or university. About half of IRBs responding said they understand the ethical dimensions of pervasive data research.

“We are still in the process right now of going through these results, but we’re already finding interesting things,” he said.

For example, approximately one-third of IRBs reported they were reviewing a study involving pervasive data about once a week. “They had 50 or more protocols they were seeing come through [annually],” he said.

Only 25% of respondents said that their IRB had sufficient technical understanding of what it means to engage in pervasive data and use these kinds of protocols, he said.

“When you start introducing pervasive data and these new protocols and data sources, that technical understanding drops,” Zimmer said. “There is a gap there for us to fill to make sure IRBs understand what these protocols mean.”

Most IRBs responding were not conducting any specific training on pervasive data, with only 30% indicating there is education on this for IRB members.

Zimmer and colleagues presented the IRBs with hypothetical research proposals using pervasive data with different variables. The proposals varied by level of consent, whether the data was considered public (like a Twitter feed), or not public (like a health forum that requires a login), he explained. Other variables included anonymity of participants and whether the researcher was adhering to the terms of service on the platform they used.

IRB members were asked to classify the hypothetical proposals into four categories: non-human subjects research, exempt, expedited, and full review. “In most cases, focus on the publicness of the data was a key indicator,” he said. “If the data were [already] published, they were generally putting that in as non-human subjects research or exempt status.”

Research Scenarios

For example, a researcher is studying a Twitter archive that somebody collected and sharing that with another researcher to understand tweets about a political event, he said.

“They are not getting consent because the data was collected by someone else and the data being used is public,” he said. “It is anonymous because they took measures to de-identify. Perhaps not surprisingly, most responses said this is either not human subjects research, or was exempt.”

In another scenario, a researcher wants to look at mental health records of university students and their social media presence. The researchers plan to collect informed consent and identifiable information from participants.

“The data are quasi-public. Maybe social media stuff is public, and we won’t anonymize because the point is to understand students and perhaps have an intervention,” he said. “All of the respondents said this will be expedited or to full review.”

Other hypothetical pervasive data protocols fell somewhere between these two examples, with some IRBs recommending more oversight and review and others willing to exempt them. “This often came down to this question of sensitivity in terms of what the researcher was doing,” he said.

For example, a research scenario examined public Twitter feeds to predict risky drug use behavior. “We are not going to get consent, but it is public,” he said. “We are going to de-identify, and we are following the terms of service, but we now have a bigger spread on how people responded to this.”

Overall, 38% of IRB respondents listed this as non-human research, and 33% said it should be exempt. However, 21% said the protocol should be expedited, and 7% said it should undergo full review. There are not necessarily right or wrong answers, as Zimmer used the hypothetical protocols to show where there is broad agreement as well as results that diverged across the board.

In another example, a researcher trying to predict election outcomes proposes to aggregate and analyze comments on a newspaper website, a practice that is forbidden by the publication’s terms of service. The IRB results were polarized, with 28% saying it was not human research and 33% saying it should be subjected to full IRB review. The latter result was driven by the proposed violation of the newspaper’s terms of service, Zimmer said.

“There is a lot of variance showing up on some of these complex, interesting, but not outrageously unlikely, research scenarios that use pervasive data,” he said.

Zimmer and colleagues will continue researching pervasive data and meet with stakeholders to develop recommendations. “We have to start thinking about coming up with some of the key principles,” he said. “We are hoping to bring in data from multiple viewpoints — not just IRBs, but actual users whose data are being collected. We are talking to researchers and the companies that run these platforms. Hopefully, we will come up with a toolkit with a set of guidelines to help make this less confusing.”


  • Vitak J, Proferes N, Shilton K, et al. Ethics regulation in social computing research: Examining the role of institutional review boards. J Empir Res Hum Res Ethics 2017;12:372-382