By Melinda Young

Researchers and sponsors are adapting quickly to virtual technologies and using big data in studies, forcing IRBs and research protection programs to adapt — particularly when it comes to privacy.

When IRBs review studies that use big data, they need to be reviewed through the lens of ethical review, says Stephen Rosenfeld, MD, MBA, president of Freeport Research Systems in Maine. Rosenfeld is the chair of the Secretary’s Advisory Committee on Human Research Protections (SACHRP).

“In the end, that’s where we find the answers,” Rosenfeld says. “I do believe IRBs have lost their way and have been very much about parsing regulatory language and documenting compliance as being behind the ethical principles.”

The focus should be on the broader context and what it means for the future. “We should re-examine the meaning of identifiable information,” Rosenfeld says. “I don’t think there’s anything in the pipeline to make that happen.”

With databases capable of capturing details about millions — or billions — of people, IRBs and researchers must re-examine privacy issues.

“With a little effort, you could probably identify a good number of people in a database,” says Michele Russell-Einhorn, JD, chief compliance officer and institutional official for Advarra of Columbia, MD. “We may get to a point where there is no longer such a thing as anonymized or de-identified data. From the perspective of the IRB, how do you make sure the people who are participating in research understand how their data will be used by the researcher?”

IRBs might consider adding a disclosure to informed consent documents to explain the potential for de-identified data to be re-identified. This will allow participants to make an informed decision, Russell-Einhorn says.

“IRBs have to make sure that what people are told is accurate,” she stresses. “Technology is moving ahead at such a fast pace that we need to make sure there is adequate education of everybody about how technology is impacting the identifiability or re-identifiability of data and how data can be used.”

Investigators with project proposals using big data or biospecimens related to humans will need at least a minimal review of their studies. “Most IRBs would not accept a consent for creation of a repository that says you can do anything you want,” Rosenfeld adds.

New Uses Raise Ethical Challenges

The rapid growth of information and possible research uses makes this ethically challenging. “Big data is continuing to increase at an exponential rate,” says James Riddle, MCSE, CIP, CPIA, CRQM, vice president of institutional services with Advarra. “The amount of data we can consume or get is ever increasing. The FDA [Food and Drug Administration], in particular, has indicated they want more real-world data in the decision-making process for drugs. It’s inevitable you will get more data in research and drug development.”

For instance, an increasing number of studies collect data from wearable technologies like Fitbits, which are used to count steps, Riddle says.

Another example is the Oura ring, which uses sensors to measure vital signs and continuously collect and transmit the information, says Megan Doerr, MS, LGC, principal scientist, governance with Sage Bionetworks in Seattle. (See story on mobile technology in this issue.) There is ongoing research into whether the ring’s sensor technology can detect changes in a person’s body before symptoms of COVID-19 infection occur. (For more information, visit:

“Companies are even developing fabric that has sensor technology woven into them,” Doerr says. “The possibilities are fast and furious.”

One challenge of big data in research is how data are stored and how much it changes the usual thoughts about protecting privacy under the Health Insurance Portability and Accountability Act (HIPAA).

“Once upon a time, people would go to the World Wide Web, pull data into their personal workspace, and do whatever they wanted,” says Stephanie Malia Fullerton, PhD, professor of bioethics and humanities at the University of Washington School of Medicine. “Big data is getting so big that this is happening less and less now. More often, people are doing things on the cloud. This is especially true in genetics and genomics.”

Researchers no longer download data to individual laptops and computers because they are too big and unmanageable. “If you combine some very clear rules about what is and is not permissible about doing things that lead to identification of people, then you could get rid of a lot of the concerns,” Fullerton explains.

IRBs might ask a lot of questions about big data studies, including “Who is making up the rules?” Fullerton says. (See story on big data and HIPAA/privacy in this issue.)

“What’s interesting is we can remember the pre-cloud years, and there are things about cloud computing that feel risky,” she explains. “Who has the data? How risky is it? It’s still a work in progress.”

Beyond data collected through wearables and mobile technology, there are many studies with large data sets like Medicare claims data and consumer products data. Large social media companies collect data that can be combined and mined to find insights on people’s consumer and economic behavior, Riddle says.

“You can combine information already used in the consumer products world and use those data to look at the real-world impact of people who take particular drugs or classes of drugs, and things of that nature,” Riddle explains. “The world of massive data sets can be combined now to draw insights into particular drugs in the real world.”

For example, a researcher interested in the economic and environmental effect on diabetes survivorship might purchase data from consumer product organizations. While de-identified, these data can be specific and combined with health data.

“If I am a diabetes researcher, I might be interested in combining data sets from the consumer products world and see if people based on consumer habits might have diabetes,” Riddle says. “I could look at whether I could triangulate their data with a Medicare dataset and look for commonalities of what I see in the consumer products space, the medical space, and draw comparisons there.” The FDA has intimated it wants to see more real-world evidence like that, he adds.

Sooner than anyone would like to acknowledge, there might be a point where there no longer is such a thing as anonymized or de-identified data. For example, with a condition like progeria, which causes premature aging, there is no information about people with the condition that could be considered de-identified because there are only about 30 people in the world with that condition, Russell-Einhorn explains.

As databases grow and become easily cross-referenced with other databases, the same might be true for anyone with any specific health conditions or lifestyle habits.

“That’s the question and conversation we need to have now: Are we there, and what will it take to get there?” Russell-Einhorn asks. “If we’re there, we need to have a conversation with IRBs, investigators, and everybody about what it means to have these big databases so individuals who participate in research have a clear understanding what their data set will be.”

Big data sets also are an issue after people die. “It’s enormously eye-opening to see how much information we give over, how little we’ve thought about it, and how long they persist — long after we’re gone,” Fullerton says. “Given that and the increasingly immortal nature of data, we cannot be blasé about privacy. We cannot.”