Risks of managing data in the cyber cloud age

Cyber disasters are potential risk

Clinical research sites impacted by hurricanes, flooding, or other disasters a decade ago faced daunting challenges in retrieving files and irreplaceable study data because they either relied on paper or were early in the transition to electronic files.

Now most research sites are invested in electronic files or back-up and so they can protect themselves better against a large, destructive local event. But how protected are they from a cyber-disaster?

Cyber problems could include cloud services that are hacked into or are suddenly closed down, as well as having natural disasters impact their physical locations.

"If you rely on a third party storage device certainly you have the potential loss if that company goes under, and then what becomes of your data," says Elizabeth A. Buchanan, PhD, endowed chair and director of the Center for Applied Ethics at the University of Wisconsin-Stout in Menomonie, WI.

"On the one hand it's a whole lot safer to put data in a cloud and have it backed up somewhere," she explains.

Cloud computing involves a network of remote servers that store and process data. Some in the information technology industry predict there will soon be a time when very few personal computers store their own data and everything will be handled via cloud storage.

"But you have to be very careful of cloud storage because you lose control of data from a disaster perspective."

For instance, the cloud network could be compromised or data could be destroyed, which is why research institutions should make certain their data back-up contractor has its own disaster plan and back-up. Also, research institutions could buy cyber insurance, so if the third party loses data there would be some form of protection, Buchanan suggests.

There are no guarantees, but research institutions can follow some steps to protect their data and subjects in the event of a cloud service disaster, including these:

• Require strict protections: "Be sure the cloud has strict protections in place," Buchanan says.

Third party cloud service sites should have strong back-up systems in the event of a large scale disaster, she says.

"By virtue of data being networked, the potential for sharing it is much more enhanced," she says. "We tell subjects their data will be secure; their data will be safe, and we'll protect your identity, but sometimes that might not be 100% accurate anymore."

Researchers can run into ethical challenges when they use outside contractors for storing data, so they need to ensure these entities are taking appropriate precautions against data loss or release.

"There's the rule of three that archivists have talked about for years: original data, back up of data, and back up of the back up," Buchanan says.

Computer server farms where enormous amounts of data are stored are increasing nationwide, a trend that suggests offsite data storage will only increase, she notes.

"I think this is the way it's going to be from now on," Buchanan says.

And data stored in cloud services, maintained in server farms, is fine, as long the highest level of protections are in place.

Researchers need to understand how this works and how it may impact data security. They also should be able to explain this clearly to their research participants, she adds.

• Create data security procedures and rules: Research sites should create data security rules that address privacy and confidentiality, risk, and variances from security requirements, which might require IRB approval.

Harvard University has a model data security plan that outlines key responsibilities and procedures, Buchanan notes.

Available online at http://security.harvard.edu, the provides for five tiers of data sensitivity. It also requires investigators to disclose the nature of the confidential data they collect to the IRB for data risk assessment. Investigators also are responsible for preparing and implementing study data security plans and procedures.

IRBs are encouraged to work with information technology staff when assessing the adequacy of researchers' confidentiality provisions.

Harvard's five tiers include a level 3 in which information has individually identifiable data that could be damaging to a person's reputation if disclosed, and a level 5, which describes data that could cause major harm to an individual, including incarceration, psychological damage, and loss of work or insurance.

"Each level has different requirements, and if you reach a level 5, they call this extremely sensitive research information about individually identifiable people," Buchanan says. "And it must be stored and processed only in physically secured rooms and not in an information network outside of that room."

The idea is that some data should not be placed in a network environment, she adds.

"Harvard's data security plan is a very useful model in helping people understand the range of data security issues," Buchanan says.

• Educate researchers on definition, processes: Research institutions should ensure investigators are fully aware of how data are handled and what various data security terms mean, Buchanan recommends.

For instance, the Harvard model data security plan includes a glossary link in which various terms are defined. One example is the term "identity key," which Harvard's glossary defines as a "code used in place of personal identifier(s) in a research data set." This is followed by "identity-mapping file," which is defined as a "data set that can be used to associate identity keys with individuals."

Defining words and terms is important since many researchers may be unaware of this language, Buchanan notes.

Once when Buchanan was speaking in a room of research professionals, she mentioned the term "cloud storage," and someone asked her what that meant, she recalls.

"I asked if they use Google email or Drop Box," Buchanan says.

"People are using these tools, and they don't know what they are," she adds. "We share data in different places, and we're not sure what the rules are when we get into clouds."

• Consider geopolitical factors when selecting data storage services: Data concentrated in single large places, called server farms, can be handled more efficiently, securely, and safely during day-to-day operations. But there is always the risk of a major disaster or cyber-attack that could disable the entire server farm.

Research institutions should consider this potential when they select data management service.

For instance, does the country or region in which the server farm exists have any geopolitical vulnerabilities to risks of cyber terrorism, government espionage, major natural disasters, or interrupted electrical power?

Server farms use enormous amounts of power and need to have back-up plans for continuing climate control even when a disaster has incapacitated the local energy grid.

Another factor involves political risk, Buchanan notes.

"There was a great story from Canada of a company that was looking at servers in the U.S. to store bibliographic research material," she says. "Some Canadian researchers said they were uncomfortable storing their bibliographic material in the U.S. when the Patriot Act had just been passed."

They were afraid the U.S. government could use the act to search through their data, effectively compromising any privacy, she adds.

"It's a major consideration for researchers to find out where their servers are located and whether they have to worry about someone looking at their data," Buchanan says. "So researchers need to be aware of different legal boundaries, as well."