In a disarmingly frank lecture in an ethics training course at the National Institutes of Health (NIH), a leader of the landmark All of Us project shared some of the concerns that come with the immense responsibility of collecting data on 1 million people.

“People talk about the All of Us research program like we already have all of this figured out — we don’t,” said Katherine Blizinsky, PhD, the policy director for the NIH program. “We launched the program and we are beginning to collect data, but we haven’t figured out all of the answers.”

There are plenty of questions, and Blizinsky peppered her talk with those of the rhetorical variety. One issue is how to secure the eroded trust of transgressed communities while acknowledging the risk that data could be deidentified or otherwise compromised to the harm of participants.

“We can tell [researchers] what we expect them to do and not to do, but that is not going to stop people who really want to do bad things,” she said. “So, a good portion of my day I sit around dreaming about all the horrible things people are going to do with our data. Because if I can think of it, somebody else out there is thinking [of it].”

The NIH is seeking research participants across the United States and territories that represent a broad diversity of racial and socioeconomic conditions. The All of Us project will use whole genome sequencing and other cutting-edge tools to create, aggregate, and analyze individual health data for years into the future. Some of the data protections outlined when the program was first announced included taking the novel measure of hiring hackers to try to breach the system on an ongoing basis. (See IRB Advisor, June 2018.)

Data Use Agreement

In addition, Blizinsky outlined a series of measures and protective layers that will limit access and possibly criminalize breach of research agreements. To eventually access the massive data trove, researchers will have to sign off on a data use agreement that clarifies what they can and cannot do.

Researchers must agree that they will:

• know and follow all applicable state and federal laws regarding human data access and privacy;

• contact the All of Us Resource Access Board (RAB) within 24 hours if they become aware of any uses or disclosures of All of Us data that endanger the security or privacy of research participants, including any unintended reidentification of participants.

Researchers must agree that they will not:

• use All of Us program data for research that is discriminatory or stigmatizing of individuals, families, or communities;

• attempt to reidentify research participants or their relatives;

• use or disclose the information other than as permitted by the data use agreement;

• make copies of or download individual-labeled data resources outside of the All of Us research environment without approval from RAB.

Locked in a Cloud

The accumulated All of Us data will be stored in a cloud enclave environment, which will use increasing layers of protection stratified by the sensitivity of the information.

“There is a public tier anyone can access without signing a data use agreement,” Blizinsky said. “That is all aggregate results, and you can only make certain types of queries. It is also not particularly sensitive data. If you want to use sensitive, individual-level data, you have to go through the application pipeline.”

This includes researchers proving their identity, passing ethics and security training, and signing the data use agreement.

“When researchers want to sit down and work with the data, they need to write a research proposal, which will be publicly posted on our website,” Blizinsky said.

This transparency allows others to flag a research proposal that they see as potentially stigmatizing and unethical. These and other matters will be reviewed by the All of Us RAB.

The NIH is striving to balance allowing access to the data against the risk of violating the privacy of research participants. A primary threat to the latter is reidentifying the data, which some researchers have warned is getting easier to do.

“As we are increasingly sensitive of how easy it is to reidentify, we are having to reconceptualize our strata of data sensitivity,” she said. “There is a sense that there is a set of infractions that we can neither 100% prevent, nor can we really dissuade people from doing. How can we loose this amazing data set to as many people as possible yet simultaneously trust them to do the right thing? What can we do to curb those bad actors?”

One measure under discussion is criminalizing the data use agreement, making violations not only unethical but illegal.

“We can treat these data use agreements as a binding contract, and a violation can be pursued as a breach of contract.”

Although the scale of the project is unprecedented, it begins with the common challenge of building a research cohort.

“We are trying to build a deep data set on a very diverse group of participants that includes longitudinal data of many modalities,” she said. “We are hoping to do that on a scale that has not to date been approached.”

The NIH is collecting data through electronic health records (EHRs) and taking physical measurements and biospecimens from participants.

The All of Us consent form used for this is “modular,” meaning it is multilayered. “We have a primary consent that goes over the bulk of the research program,” she said. “We have a separate HIPAA authorization that talks about the donation of the EHR data. We have a third consent, which talks about the return of medically actionable genomic results.”

Currently, the quest for diverse participation includes looking at ways to reach groups that speak neither English nor Spanish. Diversity is needed because the history of research “is embarrassingly white, straight, and male,” she said.

“This does us a disservice,” Blizinsky said “It is not good science when we continue to rely on the extrapolations based on that population. We are talking about groups of people that have been massively maltreated by science in the past. The battle to rebuild trust is a long and difficult one.”

Betrayed by Research

Before it receives a wealth of data to protect, the NIH program must face the disgraceful history of research abuse of minority populations such as African-Americans, Native Americans, and other groups.

“We are doing this through the lens of ethics,” she said. “We need to think about the implications of our actions: The way we are doing what we are doing, and the people we are trying to enfranchise with this program, and do it in a very conscientious and respectful way. We only get this one chance. There is no right way to do something the wrong way.”

For example, the NIH formed a panel to address concerns of Alaska Native and American Indian participants, including tribal sovereignty and the widespread rejection of DNA tests to establish tribal membership. (See IRB Advisor, November 2018.) Until these and other issues are resolved, the All of Us project has a moratorium on enrolling indigenous people.

“We are hoping to bring them into the program in a way that allows them to shape the way we interact with them,” she said. “To fundamentally change the conversation from ‘We are doing something to them’ to say, ‘They are allowing us, and helping us foster the program in its nascent stages in the tribal communities.’”

These communities distrust the use of genetic data in part because past studies used the data in ways researchers did not include in the informed consent, including to report levels of alcoholism or other stigmatizing tribal data.

“We are doing genomics as part of the All of Us research program, but this is not [just] a genomic study,” Blizinsky said. “We are collecting other valuable data. We are hoping that this is a resource that will be valuable to more than just people studying genomics.”

The challenge in reaching out to communities exploited through research is that they have the most disincentives to participate but could greatly benefit from the research. Even if cures or treatments are discovered, they will be of little good to underserved communities if they are too expensive to benefit the people that need them.

“How do we as a program ensure that the benefits that are derived from the data we are collecting benefit the people that have given us the data?” she said. “What happens if we collect information that is then used in a way that reinforces institutionalized stigma that affects a certain population?”

A part of the data use agreement that is still being drafted is how to prevent stigmatizing research from the onset and in later communication of results.

“A lot of the stigma occurs in how the information and findings are conveyed to the community — how that is interpreted and added back into common knowledge,” she said.

For example, research on gender and sexuality can become politicized and taken up by one side or the other as evidence of their views.

“We need to do more research in understanding how gender and sexuality affect health,” she said. “There is a need to do that research, but at the same time we don’t want the community to suffer from the political nature of that label.”

These issues will be addressed through the project’s Resource Access Board, which is part of the All of Us Committee on Access, Privacy and Security.

There is an ongoing effort to develop tools to help researchers “not necessarily ensure, but at least encourage them to convey their information in ways that will not raise that kind of response,” Blizinsky said.

Part of the answer is to have members of a potentially stigmatized community come before the access board and give their views on research that affects their population, she said.