There is a need for ethics review committees to improve oversight capacity for big data research, the authors of a recent paper argued.1 The authors assessed the weaknesses of ethics review committees, some of which are not specific to big data research but could be exacerbated by it, and some that are specific to big data research.

“First, it is important to understand what big data are. There are many different and disparate types of data,” says Elizabeth A. Buchanan, PhD, director of the Office of Research Support Services at the Marshfield (WI) Clinic Research Institute and coordinator of medical ethics at Marshfield Clinic Health System.

Social media data, sensors, wearables, consumer data, and medical data all are examples. Researchers analyzed aggregated data with computational methods or tools. “We’re often looking for patterns or trends to understand retrospective and prospective behaviors,” Buchanan explains.

IRBs must consider consent, use of secondary data, privacy implications, communal harms that might involve people other than the participant, and “downstream” harms that might occur long after the study is completed.

“While these are not unique to big data research, these issues are complicated and/or amplified by the scope and scale of data, and by the distance between the researcher and individuals,” Buchanan explains.

It is easy to forget the data are connected to a real person. “A community can be targeted or stereotyped based on data that are collected and combined, and then reused in a context different from their origin,” Buchanan notes.

There also is the consideration of how big data can be biased. “There is a lot of discussion around the concept of algorithmic bias and harm,” Buchanan says.2,3

IRBs always consider risk/benefit analyses for study participants. However, big data research calls for IRBs to think about risk and benefit differently. “We need to think about this in a broader scope and scale. Big data research can have very real consequences beyond the individual level,” Buchanan cautions.

Some IRBs lack the expertise to address these complex concerns. “While it has been common practice for many years to augment a board with an expert reviewer, there is just so much to consider, from legal to ethical to technical perspectives, in regard to big data research,” Buchanan says.

At Marshfield Clinic Research Institute, investigators follow a multistep process, which may include feasibility review, security review, and legal review, even before a project makes it to the IRB. The health system’s chief information security officer serves on the IRB. “That really helps us think through these complex technical issues,” Buchanan says.

The possibility of participants being identified remains a central ethical concern. “We’ve seen over and over the problems when data are mined and matched in ways never intended,” Buchanan laments. Sometimes, researchers use market data or health data where at some point people agreed to terms and conditions stating that data would be sold to third parties. Even if researchers are receiving de-identified data, “reidentification is possible,” Buchanan warns. “We shouldn’t think of de-identification as our panacea in big data research.”

Protecting human subjects from someone using data in unanticipated ways “is one of the biggest concerns and biggest challenges with the rise of big data research,” says Michael Zimmer, PhD, an associate professor at Marquette University’s department of computer science. “It is easy to lose sight that much of our ‘big data’ are actually data about people, and it is often collected without them really knowing it is happening.”

Both researchers and IRBs consider comments on Reddit or images on Instagram as inherently public. “Because of this perceived ‘publicness,’ it is easy for IRBs to provide exemptions to protocols,” Zimmer says. Human subjects probably do not fully understand how content they choose to share on a social media platform can be used by researchers. “These are questions that IRBs and researchers need to engage with, rather than just providing a simple exemption,” Zimmer offers.

Users might not fully understand or be comfortable with how their social media data could be used for research.4,5 Informed consent is “largely impossible when dealing with big data,” Zimmer says. “This does not need to completely stop research from happening. But we need to ensure processes are in place to ensure researchers have thought about issues of possible harms.”

Typically, researchers assert the data are publicly available, and IRBs provide an exemption. One obstacle to closer examination of the way “public” data are used, says Zimmer: “IRBs might lack suitable proficiency to understand the technical nuances of how certain platforms operate that researchers might leverage to create their big data sets.” For example, not all IRBs are familiar with how TikTok differs from Instagram. Meanwhile, researchers are finding novel ways to collect data from devices, profiles, and platforms. For IRBs, says Zimmer, “it’s easier to fall back on the publicness of data, rather than digging deeper into the affordances of a particular platform and what user expectations might be.”

IRBs could strengthen their ability to properly assess these protocols by including scholars with expertise in online research methods on their boards.

Much big data research focuses on vulnerable populations. If researchers obtain data from a Reddit forum dedicated to discussing depression, “reporting back to that community so they know how their data was used is essential,” Zimmer says.

Data often are used outside the original context. That is ethically problematic. “If big data are collected and used outside of that context, that must prompt a critical reflection on the ethics of the research — and not just a simple ‘but the data were public’ stance,” Zimmer adds.

REFERENCES

  1. Ferretti A, Ienca M, Sheehan M, et al. Ethics review of big data research: What should stay and what should be reformed? BMC Med Ethics 2021;22:51.
  2. Kaplan RM, Chambers DA, Glasgow RE. Big data and large sample size: A cautionary note on the potential for bias. Clin Transl Sci 2014;7:342-346.
  3. Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science 2019;366:447-453.
  4. Gilbert S, Vitak J, Shilton K. Measuring Americans’ comfort with research uses of their social media data. Social Media & Society 2021:7.
  5. Fiesler C, Proferes N. “Participant” perceptions of Twitter research ethics. Social Media & Society 2018:4.