A few simple rules ensure data sharing compliance

NIH official explains the ins and outs

The National Institutes of Health (NIH) wants all research supported with NIH funds to be shared with other investigators and made available to the public, but how can this philosophy be reconciled with privacy laws and concerns?

Sharing data is a policy of common good, notes Belinda Seto, PhD, deputy director of NIH’s National Institute of Biomedical Imaging and Bioengineering.

However, confidentiality of data and individuals’ privacy also are of utmost importance to the NIH, and these were important even before the privacy rules of the Health Information and Public Accountability Act (HIPAA) were implemented, she says.

While clinical trials managers and investigators often say that these two goals create a heavy burden on their time, especially when a data set is popular and there are multiple requests to share, it doesn’t have to be unduly burdensome, Seto adds.

For instance, investigators can request in the original grant for money to cover the cost of sharing the data.

"Even if they didn’t remember in the original application to ask for the funds, once the data set turned out to be popular and in demand, they can write to NIH and ask for a supplement to the original grant for the purpose of data sharing, and NIH will consider that," Seto says.

While the de-identifying of data takes time, it is necessary, she notes. "We believe the privacy of individuals who participated in clinical studies must be guarded and respected, and even in sharing data there are ways to protect the confidentiality of data." She offers these insights into how clinical trials managers and investigators might best meet both goals and all privacy regulations:

• Keep data secure. Data should be stored in secured, locked places, and an individual subject’s identity can be protected through various mechanisms, Seto says. They include:

  • statistically protecting identities;
  • randomizing samples with coded names and numbers;
  • eliminating opportunities for deductive identifying.

Basically this requires a common sense approach of changing the way data is de-identified according to the size sample, Seto says.

"If you have a small sample of 25-50, or even 100 people, then obviously the safeguard for privacy would be different from guarding a sample size of thousands," Seto says. "For example, if you were doing an epidemiological study, you might not want to give a census tract for a study that is small in sample size."

NIH investigators analyzed data from the adolescent health study that is supported by the National Institute of Child Health and Child Development. The study involves congressionally mandated data collection of sensitive health information from more than 20,000 students. Data collected include sexual behavior, drug use, and other sensitive information that shouldn’t be linked back to the individuals who participated in the study, Seto explains.

However, the NIH study found that if someone knew only five parameters on an individual, such as the person’s zip code, census tract, age bracket, school attended, and one other characteristic, then they would be able to deduce which student gave which survey answers, she says.

"Most of our investigators are very knowledgeable about not giving names such as John Jones’ and telephone numbers and addresses, but the idea is to raise the level of sensitivity about deductive identification," Seto says.

This is particularly important when data from one clinical trial is shared with other investigators because some of these parameters that may be desired for other analyses and research could be problematic with regard to privacy issues.

"When you share data, you have to be careful about what the recipient of the data needs," Seto says.

HIPAA provides examples of 18 identifiers that can be taken out of data. "HIPAA also allows you to look at feasibility study preparation to research," Seto says. "Under HIPAA, you can see some of the identifiers and don’t have to de-identify all elements for a small data set in a feasibility study."

• Limit the data you share with others. For example, with regard to the school student survey, NIH allowed only a subset of data to remain in the public domain, Seto explains.

Data from only 6,500 individuals were released to public access. This subset was representative of participants so as not to skew data in a way that would encourage someone to draw incorrect conclusions, she adds.

"If someone wanted to see more than the 6,500, we would have data use agreements with them, which is a contract and has legally binding authority to make sure they don’t disclose identities," Seto says.

Even with these agreements, researchers would be required to view the data only in a certain location, and they were not permitted to leave the facility with any information. If they wanted to use the data, it would have to be done with on site work, she notes.

Extreme examples of other types of data sets that might require such safeguards would be cases where the population being study is by nature quite small, such as with certain rare diseases and, perhaps, with a study of a group that is self-limiting, such as billionaires, Seto says.

"You’d be surprised how if you have multiple pieces of data, you can pretty much pinpoint who an individual is," she says.

In these rare disease cases, it would be wise to take out information about the city and treatment center, Seto adds.

• Negotiate data release and ask for a data use agreement. Other than taking off a person’s name and other directly identifying information it’s difficult to make a policy that would work in all cases with regard to data sharing, so Seto recommends that clinical trials managers and investigators provide a bear minimum amount of data to investigators.

The best way to do this is to ask the investigator who has requested the information what exactly he or she needs for the secondary analysis, and then give them no more than what they request, she advises.

"Of course, the investigator will start with the response of, The more I know, the better,’" Seto says. "But under HIPAA and even prior to HIPAA, NIH would not let you have any more identifiers than you needed."

So if an investigator asks for identifying data that could pose a privacy problem, then talk with the investigator to determine how an analysis might be done without that piece of information, she suggests.

For example, a clinical manager or investigator could say, "You wanted to look at the demographics of the disease, income levels, and education levels, and gender and race/ethnicity," Seto says. "But you do not have to pinpoint a specific community — you can say a rural community or an inner city."

She says that’s why clinical trials managers and investigators might consider requiring data use agreements for all shared data requests.

"I’ve always asked for data use agreements because it’s just so much more protection for both sides — for both the giver and recipient of data," Seto says. "If you didn’t have a data use agreement, then that second person can share with a third and fourth person downstream."

Most data use agreements are written plainly and will stipulate the conditions for sharing data. Some institutions may ask that the agreement either be written by a lawyer or reviewed by one, she notes.

While it’s not necessarily a matter of wanting to control the data, it is important to avoid abuses, which could be damaging to the individuals who gave you the data, Seto says.

"I always stipulate that whatever future uses they have with others, they have to follow the same conditions," Seto says.