Washington University School of Medicine in St. Louis Recreates COVID-19 Spread and Impact Using MDClone’s Synthetic Data

 

Having been more than 2 years since the SARS-CoV-2 virus swept the globe, scientists now have a massive repository of data on the effects of COVID-19 on various patient populations. Mining that data, though, poses new challenges, as these scientists must field that data through stringent review processes that can impede data access.

But synthetic data, which mirrors the statistical characteristics of an original data set, promises to be a solution to this problem. Synthetic data can give scientists and clinicians, as well as biotechnology and pharmaceutical companies, the ability to rapidly explore data without the typical data access challenges that come from working with clinical patient data.

Now, two recent studies by the Washington University School of Medicine in St. Louis have found that synthetic data generated by MDClone replicated the results of real patient data in analyses of COVID-19 patients. In the studies, the synthetic data reflected COVID-19 patient characteristics on a broad scale and gave the teams the ability to recreate the pandemic’s spread and impact over time. These kinds of studies can lead to a critical understanding of how viruses like COVID-19 spread and impact populations.

“We’ve shown that we can build sophisticated predictions of what is going to happen in a population with a disease like COVID-19.”

— Philip Payne, PhD, co-author and principal investigator, the Janet and Bernard Becker Professor, chief data scientist and director of the Institute for Informatics at Washington University

“It is critical that we protect patients’ rights to privacy and confidentiality while also responding to the threat posed by COVID-19 in a timely manner. No single institution can address these needs alone. Through the unique capabilities afforded by the use of synthetic data, we are accelerating our efforts to diagnose, treat and, perhaps most importantly, prevent this disease while also demonstrating how we can more effectively respond to future public health emergencies,” said Payne.

The studies, published in the Journal of the American Medical Informatics Association and the Journal of Medical Internet Research, were co-led by the Washington University School of Medicine in St. Louis and the National Institutes of Health’s (NIH’s) National COVID Cohort Collaborative (N3C), which maintains COVID-19 clinical data from 72 institutions across the USA representing more than 13 million patients. The Washington University School of Medicine in St. Louis is part of N3C and is a Community Core Member of the National Center for Data to Health (CD2H).

The studies showed which patients were at the highest risk of requiring intensive care or ventilators and helped pinpoint treatment strategy patterns to see whether drugs a patient was already taking might be protective or harmful compared with patients not taking that particular drug.

In the first paper, the team showed that synthetic data generated by MDClone not only accurately reproduced the characteristics of patients in the initial N3C dataset but also could be used to predict the risk of hospital admission or readmission for COVID-19 patients. The second paper showed that synthetic data accurately represented the spread of COVID-19 across different geographic regions.

“We know that social determinants of health — such as access to health care, education and economic stability — are related to COVID-19 transmission and outcomes,” said Adam Wilcox, PhD, a professor of medicine at the Washington University School of Medicine and senior author of both studies. This analysis shows that we can use synthetic data to study different dynamics of a pandemic, including how the pandemic changes over time and across geographic areas. These papers represent a really thorough investigation of the capabilities of synthetic data for pandemic modeling.”

The researchers of the studies said the data will allow for the prediction of future hot spots of COVID-19 and can help researchers respond faster to a future pandemic. Payne compares it to weather forecasting.

“We’re trying to build the hurricane-track equivalent for pandemics, using large amounts of data,” Payne said. “When weather forecasting works, it’s because they have a lot of prior data to learn from, and they’re able to apply that to what they’re observing now. Then they create a variety of different models predicting future scenarios — in this case, potential paths of the hurricane — and the probabilities of each. We’re building tools to do exactly the same thing but for infectious disease pandemics.”


Press Release

Synthetic Data Mimics Real Patient Data, Accurately Models COVID-19 pandemic

Research Paper

Demonstrating an Approach for Evaluating Synthetic Geospatial and Temporal Epidemiologic Data Utility: Results from Analyzing

Research Paper

The National COVID Cohort Collaborative: Analyses of Original and Computationally Derived Electronic Health Record Data


ABOUT N3C

The National COVID Cohort Collaborative (N3C) maintains one of the largest collections of clinical data related to COVID-19 symptoms and patient outcomes in the United States. With stewardship from the National Center for Advancing Translational Sciences, more than 70 institutions worked together to build this extensive database. Having access to a large, centralized data resource allows research teams to study COVID-19 and identify potential treatments as the pandemic evolves.


ON-DEMAND WEBINAR

Use of Synthetic Data for Translational Research

Presented by Dr. Randi Foraker
Washington University in St. Louis

 

Previous Post
ISPOR and MDClone Present First-Ever Webinar on Synthetic Data for RWE
Next Post
Data Granularity for Life Science Research

Real-World Scenarios

Explore how MDClone’s dynamic data exploration has been used to address the unique challenges and complexities associated with specific disease groups. From groundbreaking research to personalized treatment strategies, our real-world scenarios provide insights into the diverse ways MDClone is making a difference in healthcare.