Synthetic Data: A Powerful, New Tool Transforming Global Healthcare

Synthetic Data

  

Health Innovation Matters is a biweekly, 30-minute podcast focusing on the next generation of healthcare innovators and movers-and-shakers who are disrupting the healthcare space.

Jon D. Morrow, M.D., M.B.A. and Senior Vice President and Physician Executive at MDClone, joined the program to discuss the revolutionary concept of synthetic data, which involves using information from real patients who in the aggregate mimic the original population.

Here is a synopsis of the conversation with Dr. Morrow and host Aneel Irfan.

Aneel: What are synthetic data?

Dr. Morrow: It’s a very powerful tool that can be used to help patients around the world. Synthetic data use information on real patients to create synthetic populations of patients who never existed, but whose behavior on the population level reflects the behavior of real patients.

You take an original population, actual patients with private health information, but without exposing that information, you look at the general shapes of the data – the statistical properties, the averages, the median, the standard deviations, and the correlations between the data – to get an idea of what the whole overall group looks like. Then, from scratch, you create synthetic patients to build an identical population – it’s identical in the large view but entirely distinct on the micro, patient-by-patient level.

Aneel: Why is it important to be able to utilize synthetic data?

Dr. Morrow: You can interpret any data you want. But when you are talking about protected health information, there is a tradeoff between privacy and the utility of the data. If you had no privacy constraints, you could do whatever you wanted with the information, but that’s not applicable when talking about a patient’s sensitive health information. So, you make a trade off on what you can do with the information to protect the patient’s privacy.

For example, if I can’t reveal someone’s date of birth or address or zip code, then it’s hard for me to draw conclusions that reflect ages or geographic locations because of the need to protect privacy. MDClone’s solution with synthetic data is to create synthetic patients who never actually existed but in aggregate mimic the actual population. Then, you can ask questions of the data that you otherwise cannot ask because of patient privacy protection.

Aneel: Is synthetic data new?

Dr. Morrow: It’s part of an evolution over the past couple of decades in the concept of data anonymization. If you think back to the late 1990s and early 2000s, there was a concept of de-identified data, where you take data and remove the most sensitive identifiers. The concept is that the more specific identifiers you have attached to a particular record, the easier it is to attach that record to an individual.

The problem with removing some of those identifiers is that you also remove some utility. If I don’t have information about zip codes, then it would be hard for me to find patterns of disparity within a community if I can’t find those disparities to the zip code level. The benefit of synthetic data is that it allows you to maintain that utility while perfectly respecting patient privacy.

Aneel: Can you give me a use case example of how you can use synthetic data and protect patient privacy?

Dr. Morrow: Let’s say that you work at a hospital in Florida, and I work at a hospital in New York, and we are trying to find patterns of the relationships between Covid outbreaks and vaccination rates, patients’ ages and comorbid conditions. If I wanted to share data with you so we had a larger patient population, we would have to do that under the approval of an IRB, with privacy constraints, data sharing agreements and data protection to be able to share specific information about our patients.

Once you have synthetic data, I can create a synthetic file that contains none of my actual patients but that shows the relationships between the variables. You, in Florida, can generate a synthetic file that preserves the properties in your population. Neither of our files contain actual individual patients. It’s all synthetic patients, none of whom ever existed, and you and I can now freely share those files, combine them and then ask the data questions.

Those use cases don’t have to be Covid or Florida and New York. We can look at data locally. Because it reflects the population, but it doesn’t contain any actual real patients, there are no privacy implications.

Aneel: Tell me about the ADAMS platform.

Dr. Morrow: As important as synthetic data is, you need to also combine that with a way of accessing that data and understanding what’s contained in the data. That’s why coupling synthetic data with a self-service analytics platform like the MDClone ADAMS Platform allows clinicians, researchers and hospital operations staff to not only create synthetic data, but to unlock the knowledge.

With our ADAMS Platform, what you get is an end-to-end tool that allows you to turn data very quickly into knowledge.’

Aneel: Does the ADAMS Platform integrate with various EMRs?

Dr. Morrow: While the overwhelming amount of clinical data exists in EMRs, health systems have myriad information systems beyond the electronic health record. There are transactional systems, supply chain databases, and operational databases that may or may not reside in the electronic health record.

MDClone is built around what we call a data lake – information from disparate sources, from any source within a health system’s IT ecosystem, can be imported into a data lake. What brings everything together is that every piece of information in the MDClone data lake is tied to a particular patient at a specific point in time, and it’s tagged with a descriptor of what kind of information it is.

When you have those three things – a patient, a time, and a tag telling you what kind of information it is – you can then take information from very different sources and put it together to form a longitudinal picture of a patient’s journey from diagnosis to cure or from birth to death.

Then, you can analyze the data, create a synthetic dataset around it and you have a unified picture of your population.

Aneel: Why is a ‘data lake’ so impor
tant? Is it because data get lost in silos?

Dr. Morrow: It’s not only that the data get lost, but to access the data, you need a very specialized guide – a data analyst. But without my specialized knowledge as a clinician, and I don’t have the specialized knowledge as an analyst who is usually not a clinician, something often gets lost in translation. I speak a different language than the database analyst who is answering my question.

A self-service analytics tool like ADAMS allows me as a clinician to ask a question in my native language, and I can get to the answer very quickly. It is speed-to-innovation without threatening in any way the privacy of the patients whom I serve.

Aneel: So, using synthetic data, more actionable change happens?

Dr. Morrow: Exactly! It’s also putting it into the workflow. You act on the information.

Aneel: What about collaboration? Talk about how collaboration can work among systems.

Dr. Morrow: There is a huge role for synthetic data, and I happen to believe synthetic data are the key to unlocking boundaryless collaboration between institutions.

Sharing protected health information between institutions, particularly across national borders, is very challenging. It’s cumbersome, and it’s restricted in many ways. Covid is a great example of this. As we’ve learned, you can’t solve Covid locally. Covid requires a global, coordinated solution.

Aneel: How do you collaborate globally?

Dr. Morrow: We are building a platform called The Global Network, which is a collaboration of like-minded researchers, powered by MDClone. It allows researchers to not just share information but also ideas. They can use the discovery tools together to find the knowledge, then act, and measure and then share again what their findings are throughout a community. It is a very powerful concept that would not exist if it weren’t for synthetic data.

Aneel: This tool also balances the competitiveness of organizations with improving patient care, correct?

Dr. Morrow: It’s true. Healthcare organizations, particularly in the United States, are competitive. They are market focused and market driven, so they do need to protect their business operation’s intellectual property. But what healthcare organizations also share worldwide is the underlying mission to serve patients and to improve public health. This is why all of us got into healthcare. Physicians and nurses don’t get into healthcare because they want to corner the market. They get into healthcare because they want to help patients, cure disease, reduce suffering and improve quality of life.

Improving global health helps everyone. It helps all organizations. Synthetic data is a significant tool for accomplishing that mission. It’s finding knowledge in the data we collect and recognizing the value of that, then turning raw data into discoveries and actionable knowledge and then sharing it. Because that’s how we improve health. That’s the bottom line.

about the podcast

Health Innovation Matters is a biweekly, 30-minute podcast focusing on the next generation of healthcare innovators and movers-and-shakers who are disrupting the healthcare space. This podcast is all about increasing awareness of future health trends, accelerating technologies, and art and design perspectives. The podcast also provides a forum for elevating public discourse on ways healthcare can be more accessible, less costly, and more efficient. You’ll hear about dynamic collaborations, breathtaking technological breakthroughs, and cutting edge developments in the newest medical fields, including regenerative medicine, digital health, precision health, advanced cellular therapies, advances in nutrition, nutrigenomics and sustainable energy as it pertains to creating a health future that is equitable. You’ll hear from innovators, entrepreneurs, leaders, decision-makers, policymakers, educators, investors, and inventors. This is a health innovation podcast about results, about success stories, about possibilities.

 

about dr. jon d. morrow

Dr. Jon D. Morrow leads MDClone’s medical affairs activities for the North American market. A medical informaticist and board-certified obstetrician-gynecologist, Jon has over 25 years of experience in academic medicine, healthcare technology, and life sciences. He served as Senior Medical Leader at GE Healthcare, where he led the Medical Quality Improvement Consortium. He was Medical Director at Pfizer Pharmaceuticals and established the company’s Research Informatics group. Jon is an alumnus of MIT and of the McGill University Faculty of Medicine. He did his residency training at Northwestern University, and he received an MBA and an advanced degree in Biomedical Informatics from Columbia University. Connect with him on LinkedIn.

entire podcast

Listen to the entire podcast here.

  

Previous Post
A Year in Review: Looking Back at 2021
Next Post
The MDClone Research and Data Science Center

Real-World Scenarios

Explore how MDClone’s dynamic data exploration has been used to address the unique challenges and complexities associated with specific disease groups. From groundbreaking research to personalized treatment strategies, our real-world scenarios provide insights into the diverse ways MDClone is making a difference in healthcare.