A team at Washington University School of Medicine in St. Louis has validated the use of MDClone’s synthetic data in two spine fusion populations as part of a study that could expand opportunities for using synthetic data in spine surgery research.
The study aimed to determine whether spine surgery outcomes calculated using synthetic data are comparable to those using de-identified data gathered from electronic health records. Synthetic data, which are artificially generated, mirror real patient data but don’t risk revealing patient identities because there is no one-to-one correspondence between the synthetic and real patient populations.
The team, led by Randi Foraker, PhD, at Washington University School of Medicine in St. Louis, populated data from the university’s Research Data Core (RDC) into the MDClone ADAMS Platform, a self-service data analytics platform for healthcare collaboration, research, and innovation. The study included two patient cohorts — one with patients who had undergone anterior cervical fusion and another with patients who had undergone posterior lumbar fusion. The team wanted to determine: (1) 30-day hospital readmission rates and (2) postoperative complications within 30 days of surgery for patients in the two cohorts.
The researchers then performed multiple statistical tests to determine the similarities between the synthetic and real datasets:
-
Chi-Squared tests, which can determine whether the number of times an event occurred is consistent between the observed results and the expected results
-
t-tests, which is used to look at whether the means of two populations are equal
-
Mann–Whitney U tests, which can compare differences in dependent variables for two populations
When the team compared the data, the distribution of the data “appeared nearly identical between datasets,” wrote the authors. The team found that in the dataset with real patient data, 6.2% of patients had been readmitted to the hospital within 30 days of surgery versus 6.1% in the synthetic population. Additionally, in the real patient population, 3.0% of patients had experienced a 30-day complication compared with 3.4% in the synthetic dataset.
“Our results in 2 spine surgery populations suggest that synthetic data derivatives almost entirely replicate population descriptive characteristics, while also closely simulating predictive performance. These findings suggest a wide array of potential applications, including epidemiological analyses, studies of surgical trends, and profiling quality outcome metrics,” the authors wrote.
Ultimately, the team found that synthetic data — specifically that generated by MDClone — closely mirrored real patient data in two spine surgery populations, which could enable synthetic data to be used broadly for research, with the goal of improving patient outcomes.
Case Study
Synthetic data mimics real healthcare data without patient-privacy concerns