We noticed you're browsing in private or incognito mode.

To continue reading this article, please exit incognito mode or log in.

Not a subscriber? Subscribe now for unlimited access to online articles.

  • John Moore | Getty
  • Intelligent Machines

    A little-known AI method can train on your health data without threatening your privacy

    Machine learning has great potential to transform disease diagnosis and detection, but it’s been held back by patients’ reluctance to give up access to sensitive information.

    In 2017, Google quietly published a blog post about a new approach to machine learning. Unlike the standard method, which requires the data to be centralized in one place, the new one could learn from a series of data sources distributed across multiple devices. The invention allowed Google to train its predictive text model on all the messages sent and received by Android users—without ever actually reading them or removing them from their phones.

    Despite its cleverness, federated learning, as the researchers called it, gained little traction within the AI community at the time. Now that is poised to change as it finds application in a completely new area: its privacy-first approach could very well be the answer to the greatest obstacle facing AI adoption in health care today.

    “There is a false dichotomy between the privacy of patient data and the utility of the data to society,” says Ramesh Raskar, an MIT associate professor of computer science whose research focuses on AI in health. “People don’t realize the sand is shifting under their feet and that we can now in fact achieve privacy and utility at the same time.”

    Sign up for The Algorithm
    Artificial intelligence, demystified

    Over the last decade, the dramatic rise of deep learning has led to stunning transformations in dozens of industries. It has powered our pursuit of self-driving cars, fundamentally changed the way we interact with our devices, and reinvented our approach to cybersecurity. In health care, however, despite many studies showing its promise for detecting and diagnosing diseases, progress in using deep learning to help real patients has been tantalizingly slow.

    Current state-of-the-art algorithms require immense amounts of data to learn—in most cases, the more data the better. Hospitals and research institutions need to combine their data reserves if they want a pool of data that is large and diverse enough to be useful. But especially in the US and the UK, the idea of centralizing reams of sensitive medical information in the hands of tech companies has repeatedly—and unsurprisingly—proved intensely unpopular.

    As a result, research on diagnostic uses of AI has stayed narrow in scope and applicability. You can’t deploy a breast cancer detection model around the world when it’s only been trained on a few thousand patients from the same hospital.

    All this could change with federated learning. The technique can train a model using data stored at multiple different hospitals without that data ever leaving a hospital’s premises or touching a tech company’s servers. It does this by first training separate models at each hospital with the local data available and then sending those models to a central server to be combined into a master model. As each hospital acquires more data over time, it can download the latest master model, update it with the new data, and send it back to the central server. Throughout the process, raw data is never exchanged—only the models, which cannot be reverse-engineered to reveal that data.

    There are some challenges to federated learning. For one, combining separate models risks creating a master model that’s actually worse than each of its parts. Researchers are now working on refining existing techniques to make sure that doesn’t happen, says Raskar. For another, federated learning requires every hospital to have the infrastructure and personnel capabilities for training machine-learning models. There’s also friction in standardizing data collection across all hospitals. But these challenges aren’t insurmountable, says Raskar: “More work needs to be done, but it’s mostly Band-Aid work.”

    In fact, other privacy-first distributed learning techniques have since cropped up in response to these challenges. Raskar and his students, for example, recently invented one called split learning. As in federated learning, each hospital starts by training separate models, but they only train it halfway. The half-baked models are then sent to the central server to be combined and finish training. The main benefit is that this would alleviate some of the computational burden on the hospitals. The technique is still mainly a proof of concept, but in early testing, Raskar's research team showed that it created a master model nearly as accurate as it would be if it were trained on a centralized pool of data.

    A handful of companies, including IBM Research, are now working on using federated learning to advance real-world AI applications for health care. Owkin, a Paris-based startup backed by Google Ventures, is also using it to predict patients’ resistance to different treatments and drugs, as well as their survival rates with certain diseases. The company is working with several cancer research centers in the US and Europe to utilize their data for its models. The collaborations have already resulted in a forthcoming research paper, the founders say, on a new model that predicts survival odds for a rare form of cancer on the basis of a patient’s pathology images. The paper will take a major step toward validating the benefits of this technique in a real-world setting.

    “I’m really excited,” says Owkin cofounder Thomas Clozel, a clinical research doctor. “The biggest barrier in oncology today is knowledge. It’s really amazing that we now have the power to extract that knowledge and make medical breakthrough discoveries.”

    Raskar believes the applications of distributed learning could also extend far beyond health care to any industry where people don’t want to share their data. “In distributed, trustless environments, this is going to be very, very powerful in the future,” he says.

    This story originally appeared in our AI newsletter The Algorithm. To have it directly delivered to your inbox, sign up here for free.

    Learn from the humans leading the way in machine learning at EmTech Next. Register Today!
    June 11-12, 2019
    Cambridge, MA

    Register now
    More from Intelligent Machines

    Artificial intelligence and robots are transforming how we work and live.

    Want more award-winning journalism? Subscribe to Print + All Access Digital.
    • Print + All Access Digital {! insider.prices.print_digital !}*

      {! insider.display.menuOptionsLabel !}

      The best of MIT Technology Review in print and online, plus unlimited access to our online archive, an ad-free web experience, discounts to MIT Technology Review events, and The Download delivered to your email in-box each weekday.

      See details+

      12-month subscription

      Unlimited access to all our daily online news and feature stories

      6 bi-monthly issues of print + digital magazine

      10% discount to MIT Technology Review events

      Access to entire PDF magazine archive dating back to 1899

      Ad-free website experience

      The Download: newsletter delivery each weekday to your inbox

      The MIT Technology Review App

    You've read of three free articles this month. for unlimited online access. You've read of three free articles this month. for unlimited online access. This is your last free article this month. for unlimited online access. You've read all your free articles this month. for unlimited online access. You've read of three free articles this month. for more, or for unlimited online access. for two more free articles, or for unlimited online access.