Medical Info

Thinking ‘oat’ of the box: Technology to resolve the ‘Goldilocks Data Dilemma’

Marielle Gross
Robert Miller


The problem with porridge

Today, we regularly hear stories of research teams using artificial intelligence to detect and diagnose diseases earlier with more accuracy and speed than a human would have ever dreamed of. Increasingly, we are called to contribute to these efforts by sharing our data with the teams crafting these algorithms, sometimes by healthcare organizations relying on altruistic motivations. A crop of startups have even appeared to let you monetize your data to that end. But given the sensitivity of your health data, you might be skeptical of this—doubly so when you take into account tech’s privacy track record. We have begun to recognize the flaws in our current privacy-protecting paradigm which relies on thin notions of “notice and consent” that inappropriately places the responsibility data stewardship on individuals who remain extremely limited in their ability to exercise meaningful control over their own data.

Emblematic of a broader trend, the “Health Data Goldilocks Dilemma” series calls attention to the tension and necessary tradeoffs between privacy and the goals of our modern healthcare technology systems. Not sharing our data at all would be “too cold,” but sharing freely would be “too hot.” We have been looking for policies “just right” to strike the balance between protecting individuals’ rights and interests while making it easier to learn from data to advance the rights and interests of society at large. 

What if there was a way for you to allow others
to learn from your data without compromising your privacy?

To date, a major strategy for striking this balance has involved the practice of sharing and learning from deidentified data—by virtue of the belief that individuals’ only risks from sharing their data are a direct consequence of that data’s ability to identify them. However, artificial intelligence is rendering genuine deidentification obsolete, and we are increasingly recognizing a problematic lack of accountability to individuals whose deidentified data is being used for learning across various academic and commercial settings. In its present form, deidentification is little more than a sleight of hand to make us feel more comfortable about the unrestricted use of our data without truly protecting our interests. More of a wolf in sheep’s clothing, deidentification is not solving the Goldilocks dilemma.

Tech to the rescue!

Fortunately, there are a handful of exciting new technologies that may let us escape the Goldilocks Dilemma entirely by enabling us to gain the benefits of our collective data without giving up our privacy. This sounds too good to be true, so let me explain the three most revolutionary ones: zero knowledge proofs, federated learning, and blockchain technology.

  1. Zero Knowledge Proofs

Zero knowledge proofs use cutting edge mathematics to allow one party (the “prover”) to prove the validity of a statement to another party (the “verifier”) without disclosing the underlying data about their statement. Put another way, zero knowledge proofs let us prove things about our data without giving up our privacy. This could be an extremely valuable strategy in research since we could learn, for example, which treatments worked best for which people without needing to know which people received which treatments or what their individual outcomes were. Zero knowledge proofs are already being used in healthcare today—pharmaceutical manufacturers in the MediLedger project are deploying them to keep our drug supply chains both private and secure. 

  • Federated Learning

Another privacy enabling innovation is federated
learning, which enables a network of computers to collaboratively train one
algorithm while keeping their data on their devices. Instead of sending
their data to a central computer to train an algorithm, federated learning
sends the
algorithm to the data, trains it on data locally, and only
shares the updated algorithm with other parties. By decoupling the training of
algorithms from the need to centralize data, federated learning limits the
exposure of an individual’s data to privacy risks. With federated learning,
several of the world’s largest drug makers, usually fierce competitors, are
collaborating in the MELLODDY project to advance drug
discovery. Federated learning lets these companies collectively train a single
shared algorithm on their highly proprietary data without compromising their
privacy to their competitors. Collectively these companies benefit as they are
effectively creating the world’s largest distributed database of molecular
data, which they hope to use to find new cures and treatments, a process that
promises to benefit us all.

  • Blockchain

Blockchain technology also has a critical role
to play in creating a secure network for data sharing. The much hyped
“blockchain” stems from its first implementation in Bitcoin but has much more
broad applicability. Blockchains combine cryptography and game theory such that
a network of computers reach consensus on a single state, you can think of them
as analogous to a network of computers joining together to create one giant
virtual computer. This virtual computer maintains a shared ledger of “the
truth,” a sort of database the contents of which are continuously verified by
all the computers in the network, and runs autonomous programs called “smart
contracts.” These aspects of blockchains provides uniquely strong assurances
of trust in data security and use
; they execute the rules of the network
consistently and objectively, and the whole process is transparent and
universally auditable on the shared ledger. When applied to health data
these properties could empower individuals with an unprecedented ability
to supervise and control the use of their own data, and a thriving market
of startups have emerged for exactly this use case.

The way forward

The cumulative significance of these paradigm-shifting technologies is their potential to eliminate the Goldilocks Dilemma between privacy and learning, individuals and the collective, once and for all. Their emergence forces us to rethink not only our national health IT policy, but our underlying ethical and legal frameworks as well. By creating the potential to build a future in which our treatment of data simultaneously respects individual and collective rights and interests, we believe there is an obligation to further develop and scale the core privacy-protecting functions of these technologies. Our aim is to spread awareness of the possibility of resolving a fundamental 21st century ethical dilemma with a technological solution. In this case, “can” implies “ought”– we must advocate for and demand that these and similar innovations be embedded into the future of our data and our health.

Robert Miller is building privacy solutions at ConsenSys Health and manages a blockchain and healthcare newsletter at

Marielle S. Gross, MD, MBE is an OB/GYN and fellow at the Johns Hopkins Berman Institute of Bioethics where her work focuses on application of technology and elimination of bias as means of promoting evidence-basis, equity and efficiency in women’s healthcare (@GYNOBioethicist). 

This content was originally published here.

Have your say