De-identification software for researchers and IRBs

Software creates safe researcher playground

Editor’s note: In this issue of IRB Advisor is a continuation of a series on how IRBs can use software programs to improve their operations. This month we profile De-ID de-identification software.

As IRBs deal with the privacy provisions of HIPAA, they often must decide whether researchers are allowed to waive individual authorization for use of patients’ data.

One way to exempt research from the requirements of HIPAA is for a researcher to show that data has been sufficiently de-identified — stripped of the 18 individual identifiers named in the privacy rule.

Often, IRBs must wade through the researchers’ de-identification plans to ensure that patient privacy is protected. Some IRBs, however, are naming de-identification tools as their own institutional standard, in order to make the process more efficient, says Steven Merahn, MD, chief medical officer of De-ID Data Corp. in Philadelphia, which markets De-ID de-identification software.

"It removes a barrier to clinical research at every institution," Merahn says. "So the researchers now only have to think about the research, they don’t have to worry about building into their research project the de-identification part of it."

He says that when an IRB incorporates his company’s software as a compliance tool, it ensures consistency across projects and gives the IRB confidence that the de-identification is being done properly.

"You know that regardless of the researcher, they’re using the same exact method," he says. "It’s the same tool, a tool you know is developed and supported. It streamlines the compliance process, it enhances the productivity both of the IRB and of the researchers themselves, as well as maintaining a certain level of quality of de-identification."

He says there are even broader and potentially more exciting uses of the software that IRBs can glean by using them on an institutional level.

Merahn says he’s currently speaking with a potential IRB client who wants to de-identify large numbers of medical files to provide to researchers as a de-identified database with which to do various types of research.

"It allows for a lot more creativity and exploration on the part of the researchers," he says.

Developed at Pittsburgh

De-ID was developed at the University of Pittsburgh about five years ago, Merahn says, to solve the problem of reliability and consistency of de-identification across the institution’s various research projects.

"They said, We’ve got all these people coming in, presenting research projects to the IRB,’" Merahn says.

"They’ve got to certify a de-identification method as part of their IRB application. And yet we’re not looking at the cross-project consistency of these schemes. So the informatics group at Pittsburgh said maybe there’s a way through automation that we can increase the reliability and consistency of de-identification."

Once the software was developed and used at Pittsburgh, and its reliability and validity was studied, it became the de-identification standard for the IRB at the University of Pittsburgh Medical Center, Merahn says.

When Merahn and his partner were doing work on informatics and research, he says they came to realize that de-identification was a huge barrier to fuller progress in their fields. So when they came across the University of Pittsburgh’s De-ID software, they saw its value and made a deal with the university to commercialize the software.

Kim C. Coley, PharmD, an associate professor of pharmacy and therapeutics at the University of Pittsburgh School of Pharmacy, uses De-ID frequently in her work. She jokingly calls herself a "guinea pig" for the earlier versions of the software as it was developed at the university.

"When the software first came out, we would notice that something might be stripped that shouldn’t have been stripped — for instance, a dosage of a drug, because they thought it was an address, or something like that," she says. "And we’d tell them, and they’d go back and fix the software."

She says the bugs all were worked out of the software long before it got to the commercialized version of De-ID, and she no longer sees these types of problems.

Her work involves using existing clinical data retrospectively, after having it de-identified to meet HIPAA guidelines.

For example, she might design a study to look at the possible effect of anemia on the likelihood of a patient fall, using the risk management database of falls and comparing them to lab records of blood draws.

In order to avoid privacy concerns, Coley uses a so-called honest broker — a person who can collect the necessary files and then de-identify them before the researcher sees them. It is that person who actually runs the software.

Coley says De-ID has been useful not only in working with formatted reports such as pharmacy reports but with full text files as well, searching for the identifiers and stripping them out.

"We couldn’t do what we do here at our center without this software, and still meet the [HIPAA guidelines]," she says. "We frequently submit our research to the IRB as exempt from informed consent. This enables us to meet those criteria."

Merahn describes De-ID as a plug and play system, which can run as a freestanding program on a laptop, if necessary. The operator can have De-ID call up the application where data files are stored, input the files and automatically de-identify a pool of records. The process creates an entirely separate, de-identified record, while not changing the original record.

Instead of simply eliminating some of the identifiers, the software can be programmed to instead provide proxies and offsets to increase the files’ research value, Merahn says.

For example, it can use proxies for doctors’ names (Dr. A, Dr. B, Dr. C) or use an age range in place of a specific age. The software’s dictionary can be edited to include common location names and acronyms at the institution.

Researchers’ playground

Last year, the National Cancer Institute licensed De-ID for the de-identification component of some of the applications in its Cancer Biomedical Informatics Grid (CaBIG).

"They’re building a 50-hospital content repository for pathology data and they’re using De-ID software as their HIPAA/patient privacy compliance tool for the entire grid," Merahn says. "Every one of the hospitals that participates has De-ID on site as the de-identification tool that allows the record to leave the hospital and be put in to the content repository."

Merahn says he’s excited by De-ID’s potential for use by IRBs to create safe databases of de-identified files as a so-called playground for researchers.

"Some IRBs are saying in order to facilitate research at their organization, they’re going to en masse, de-identify 10,000 records," he says. "They’re going to put them in a de-identified database, and they’re going to let their researchers have that as a playground.

"Normally, you wouldn’t be able to do that — that would be a massive investment," Merahn says. "It wouldn’t be possible without automated de-identification. That’s the big difference that De-ID can make to a lot of institutions."

He says he’s currently in negotiations with an IRB to acquire De-ID for that purpose.

"The IRB is incredibly excited about it because they don’t have to see every project now; it’s a batch that has been approved so that the IRB can comfortably allow research to take place."

Merahn says the annual cost of the software contract varies based on the size of the institution — anywhere from single projects at an academic medical center to network-level pricing. The cost includes regular updates.

"I can tell you it’s very reasonable for medical centers and researchers," he says. "We are pricing this to make it usable."

For more information on De-ID, visit De-ID Data Corp. website at