By utilizing MDClone, the School of Medicine is able to provide higher fidelity data to researchers than traditional de-identification methods would allow for while complying with the strictest patient privacy and confidentiality requirements.
THE SOLUTION: MDCLONE AND SYNTHETIC DATA
MDClone, through software it developed, generates synthetic data that mimic real patient populations, and those synthetic data sets can then be made available to School of Medicine researchers quickly and efficiently. Synthetic data can be analyzed as if they were original data while addressing important privacy concerns. To do so, MDClone creates a synthetic analogue of healthcare data collected from actual patient populations. While the synthetic data set is similar in overall characteristics to the original data, there is no identifying information that can be traced back to individual patients or their identities. MDClone’s software, as well as the computational and network environments where the software is used, have been designed to comply with state-of-the-art cybersecurity and privacy controls.
“The beauty of synthetic data is that they allow us to quickly create data sets that look and feel just like the real data that are generated every time we interact with patients, which results in studies that arrive at conclusions comparable to those conducted with such source data, while greatly improving our ability to protect and maintain the confidentiality and privacy of the patients and communities we serve,” said Dr. Payne.
BENEFITS
To demonstrate the power of synthetic data, the School of Medicine has conducted three pilot projects to ensure both data accuracy and patient privacy, focusing on:
The school’s researchers were able to use synthetic data just as they would have used real patient data to not only describe populations but also produce and validate advanced machine learning (ML) models.
Despite coming from very different domains, our three use cases all demonstrated that the synthetic data were scientifically valid representations of the original data. We showed that a ML model trained on the synthetic data performed well when applied to the original data,” said Randi Foraker, PhD, MA, FAHA, FAMIA, Director of the Center for Population Health Informatics at I2, Director of the Center for Administrative Data Research, and the Director of the Public Health Data & Training Center at the Institute for Public Health, Washington University in St. Louis.
This speed and agility in terms of accessing data have created an ideal environment for researchers at the School of Medicine to accelerate scientific discovery. In addition, as a founding member of MDClone’s Global Network, I2 has be able to collaborate with academic and health systems throughout the globe, instantly enabling high impact, data-driven research and innovation projects that reach across institutional boundaries.
With the MDClone platform, researchers who have completed a training curriculum and signed a data use agreement that ensures they will use synthetic data responsibly and for scientific purposes only can log in and conduct queries in real time. Instead of waiting weeks, months, or years to gain access to information, users can get access to data in real time in order to ask and answer important questions about the data produced in the healthcare environment. In addition, the ability of MDClone users to rapidly iterate and refine their data-driven projects, leveraging the sophisticated MDClone query and data analysis tools, means that projects can be quickly designed and executed, especially when using emergent computational methods such as ML and cognitive computing.