As we leave the analog age and enter the vast expanse of digital “big data,” the potential benefits and risks for social science research are uncharted. The power for good use of aggregated ever-expanding data sets is unprecedented. However, this bright future casts a shadow as big data raises the specter of ethical breaches of informed consent and violated privacy.

The author of a new book on this challenge calls on social science researchers to take ethical responsibility for their studies, reminding them that IRB standards should be viewed as the minimum parameters.

“The IRB is a floor, not a ceiling,” says Matthew J. Salganik, PhD, a sociologist and researcher at Princeton (NJ) University. “In many cases, it is the researchers themselves who have the most knowledge about the risks — both how those risks can be mitigated, and how the benefits can be maximized. I think that researchers should not stop once the IRB approval has been granted. They should continue to try to improve the ethical balance in their work.”

As an IRB member at Princeton, Salganik encourages medical ethics boards to welcome participation by social science researchers.

“Researchers have an opportunity and an obligation to think about the ethics of what they are doing, beyond just what the IRB requires,” he says. “I think most researchers do that, particularly in computational social science and big data. The researchers often have more awareness than the IRBs about what is possible and what is a potential risk. It’s unreasonable to expect the IRBs to have more technical expertise than the actual researchers. There has to be a role for the researchers themselves in this process. I think IRBs should encourage that.”

Many of these issues are explored in Salganik’s book, Bit by Bit: Social Research in the Digital Age.1 We asked him whether big data research has the potential to lead to the kind of disastrous, unethical studies that have long been the bane of human research.

“Absolutely — I mean, people are people,” he says. “Those in the past have done unethical things, and they could certainly do them in the present and future. Some of the risks have changed a little, but I certainly think it is possible for those kinds of things to happen. IRBs are important.”

Some of the digital data risk may occur beyond the range and oversight of IRBs, but hopefully the knowledge base of ethical research can be brought to bear on such situations.

“It’s also relevant to separate researchers from governments,” he says. “Some people are unhappy with the way that large companies and governments surveil people and their behavior and build enormous databases. That poses a number of questions that are beyond the scope of the IRB. Maybe some of the ideas that IRBs have developed could help companies and governments with how to use their data responsibly outside of the settings that are covered in the Common Rule.”

Salganik opens his book with a seemingly innocuous example of a cellphone survey in Rwanda, where researchers called people to ask their demographic and social characteristics. The study took a digital leap when the researchers integrated the survey data with the call records of all customers of the mobile phone provider. Crunching the data in a computer model, they developed a method to predict a person’s income and economic status by their call records, creating a map of wealth distribution across the African nation.

Given the possibilities suggested by this relatively benign example, it is understandable that Salganik dedicates a full chapter in his book to ethics and social science research. At first glance, digital aggregated data sets would seem to lend themselves to sharing information and ease reproducibility, a common want in the clinical research world.

“On the one hand, the capabilities of the digital age definitely make it easier to transmit and store our data and our code, and make it easier for other people to run our code,” he says. “But obviously, some of this data is not sharable for privacy reasons. Some of this data is owned by companies, and some of it is potentially very personally identifiable.”

The technical infrastructure to share data is increasing, but there will continue to be questions about deidentifying research subjects and ensuring they give informed consent.

“I think there is going to be an increasing role for third-party oversight,” Salganik says. “It helps decrease the chance of a bad outcome. It also helps build confidence in the public and promote best practices. Again, these are really hard, complicated issues. The idea of having [IRBs] that have seen many of these things before and can offer guidance — that is potentially a very helpful thing that a lot of researchers would want.”

To assist in this effort, social science researchers should consider drafting an “ethical appendix” to track and report issues as they arrive in a supplement published with the article. Suggesting that researchers begin this effort before their study begins, Salganik explains in his book that this exercise is designed, in part, to “force yourself to think about how you will explain your work to your peers and the public. If you find yourself uncomfortable while writing your ethical appendix, then your study might not strike the appropriate ethical balance.”

While this ethical diary of sorts may help the researcher during the study, publication of these appendices with the research could inform decisions when such issues arise in subsequent trials.

“The decisions that we have to make about the ethics of using big data are very complicated,” he says. “Right now, there are a lot of ethical discussions happening among researchers that are not written down. Researchers do not necessarily feel there is a venue to include that. For a researcher, writing an ethical appendix can help clarify your own thinking. Also, it can help you clarify your thinking to other researchers who are facing similar challenges.”

IRBs who find this idea intriguing could encourage researchers, but the process may be more effective if there is no attempt at perfection, he says.

“No one is going to argue that these ethical appendices are some final, perfect handling of a situation,” he says. “It is more like, ‘These are the ethical issues I thought about, here are the steps I took, and here is what I decided to do.’ If other people have a better way of thinking about it and can do it better — great.”

Such open discussion can improve social research design and digital data protection. Salganik envisions social research marked by ongoing assessment and communication, a continuum that moves away from the more dogmatic, binary view that something either is ethical or it is not.

“If we think of ethics oversight as checking boxes to get IRB approvals, we are all missing some important opportunities,” he says. “As researchers, we are missing the opportunity to make our studies safer and more beneficial and more ethically well-balanced. I think as IRBs, we are also missing the opportunity to make suggestions to help improve the study. One way to think about it is that no matter what we’re doing, it could probably be better.”

Opening this ongoing dialogue is a nod to “intellectual humility, which is appropriate in the face of difficult ethical challenges,” he wrote. This is something of a Socratic approach, where the reward of wisdom is asking the next question regarding the unknown.

“It reminds us how there are no easy answers to these issues,” he tells IRB Advisor. “People who want easy answers to the ethics of big data research are going to be disappointed. But that doesn’t mean we can’t make progress. In fact, we can make a lot of progress by building on existing ethical principles, like the Belmont Report.”


1. Salganik MJ. Bit by Bit: Social Research in the Digital Age. Princeton University Press. Princeton; Oxford 2017.