Denoising Epidemic Data: A New Approach to Improving Disease Spread Predictions

Denoising Epidemic Data: A New Approach to Improving Disease Spread Predictions

The accurate prediction of infectious disease spread is a fundamental challenge in epidemiology. Effective public health interventions rely on precise forecasts to anticipate outbreaks, allocate resources, and implement timely containment measures. However, epidemic data is often incomplete or unreliable due to underreporting, limited testing, and privacy concerns. Many infections remain undetected, particularly in the early stages of an outbreak, and the structure of interactions within a population adds further complexity to modeling disease transmission. These limitations introduce significant uncertainty into traditional epidemiological models, which depend on high-quality data to make accurate predictions.

To address these challenges, Olga Klopp, Professor of Information Systems, Data Analytics and Operations, alongside co-authors Claire Donnat (University of Chicago) and Nicolas Verzelen (INRAE,  Montpellier) introduce a novel approach that improves epidemic forecasting by applying a mathematical technique known as Total Variation (TV) denoising, originally developed in the field of signal processing. By adapting this method to epidemic modeling, they demonstrate its effectiveness in reconstructing missing data and refining predictions of disease dynamics on contact networks. This approach provides a more accurate estimation of infection probabilities, offering a valuable tool for tracking epidemics even when observations are incomplete.

Challenges in Epidemic Data and Modeling

Modeling infectious disease transmission typically relies on compartmental frameworks that categorize individuals into different states, such as susceptible, infected, and recovered. These models have been widely used to study disease spread, but their accuracy is constrained by the quality of the available data. The observation of epidemic processes in real-world scenarios is often incomplete: not all infected individuals are tested or report their symptoms, and privacy regulations limit the accessibility of detailed contact data. Additionally, transmission occurs over complex social networks where some individuals play a more significant role in spreading infections than others. Traditional models struggle to incorporate these complexities, leading to potential biases in predictions.

Network-based epidemic models offer a more refined approach by capturing the heterogeneity of interactions within a population. However, the effectiveness of these models is also dependent on the reliability of the underlying data. Missing observations can result in misleading conclusions about the trajectory of an outbreak, particularly when early cases go undetected. Addressing  the challenge of incomplete data is therefore crucial for improving the accuracy of epidemic predictions.

Total Variation Denoising for Epidemic Forecasting

To improve epidemic modeling in the presence of noisy and incomplete data, this study proposes the use of Total Variation denoising, a method originally developed to enhance the quality of images and signals by removing random distortions while preserving key structural features. When applied to epidemic data, this technique allows for the reconstruction of missing information and the smoothing of inconsistencies that arise due to underreporting or observational gaps.

The core idea behind this approach is that disease spread follows structured patterns, particularly within connected groups of individuals. By leveraging the inherent structure of contact networks, TV denoising estimates infection probabilities with greater accuracy, ensuring that predictions remain robust even when direct observations are unavailable. The study establishes theoretical guarantees demonstrating that this method provides consistent and reliable estimates, extending previous mathematical results that were limited to different types of data. The ability to correct for missing information makes this approach particularly useful for early outbreak detection, when data is scarce but timely interventions are most critical.

A key innovation in this work is the extension of TV denoising to scenarios where only partial observations of infection states are available. In many real-world settings, health surveillance systems capture only a subset of infections due to limitations in testing capacity and reporting behavior. The study demonstrates that TV denoising remains effective under these conditions by incorporating a probabilistic framework that accounts for missing observations. This adaptation ensures that even when large portions of the population remain unobserved, the method can still provide meaningful estimates of disease spread.

More reliable predictions enable better allocation of medical resources, more targeted interventions, and improved decision-making regarding containment measures such as quarantine policies and vaccination campaigns. The ability to infer missing data also enhances early-warning systems, allowing for more effective monitoring of emerging outbreaks.

Application to COVID-19 Data

To validate the effectiveness of this approach, the study includes a numerical study based on real-world COVID-19 data from California. Using reported case counts and estimated contact networks, the researchers applied TV denoising to reconstruct the progression of the epidemic The results demonstrate that the method successfully identifies key trends in infection dynamics while compensating for gaps in the available data. Comparisons with traditional modeling approaches show that TV denoising leads to more accurate predictions, particularly in early outbreak phases when data uncertainty is highest. This application highlights the potential of the method for real-time epidemic tracking and intervention planning, particularly in settings where comprehensive surveillance data is unavailable.

As public health continues to face challenges from emerging infectious diseases, improving the accuracy of epidemic forecasts will remain a critical priority. The findings of this research offer a promising direction for enhancing predictive capabilities and strengthening preparedness for future outbreaks.

Beyond its applications in epidemiology, this approach contributes to a broader understanding of how incomplete or imperfect data can be handled in network-based modeling. The challenges of missing information are not unique to infectious disease tracking but are also present in other domains, such as misinformation spread on social networks, economic forecasting, and environmental monitoring. The adaptation of TV denoising to epidemic modeling highlights its potential as a versatile tool for extracting meaningful patterns from sparse and unreliable datasets.

Reference

Donnat, C., Klopp, O., & Verzelen, N. (2024). One-Bit Total Variation Denoising over Networks with Applications to Partially Observed Epidemics. arXiv preprint arXiv:2405.00619.

FOLLOW US ON SOCIAL MEDIA