Preface
Disease early detection and prevention offer numerous benefits to both our health
and society. Often, the earlier a disease is detected, the higher the likelihood
of successful cure or management. Managing a disease in its early stages can
significantly reduce its impact on a patient’s quality of life and decrease
healthcare costs. To detect a disease early, disease screening has become a
popular tool. This method aims to determine the likelihood of a given patient
having a particular disease by applying medical procedures or tests to check the
major risk factors, even in patients without obvious symptoms of the disease.
While disease screening primarily focuses on individual patients, disease
surveillance is for detecting disease outbreaks early within a given population.
For example, our society faces constant threats from bioterrorist attacks and
pandemic influenza. It is thus important to monitor the incidence of infectious
diseases continuously and detect their outbreaks promptly. This allows governments
and individuals to implement timely disease control and prevention measures,
minimizing the impact of these diseases. This book introduces some recent analytic
methodologies and software packages developed for effective disease screening and
disease surveillance.
My exploration into disease screening was motivated by an experience around 2010
when I analyzed a dataset from the Framingham Heart Study (FHS). The FHS primarily
aims to identify major risk factors for cardiovascular diseases (CVDs), and
numerous CVD risk factors have been recognized since the study's inception in 1948,
including smoking, high blood pressure, obesity, high cholesterol levels,
physical inactivity, and more. During my data analysis, a pivotal question emerged:
Could the identified CVD risk factors be utilized to predict the likelihood of a
severe CVD, such as stroke, for individual patients? Statistically, this translates
into a sequential decision-making problem, where the relevant statistical tool is
the statistical process control (SPC) charts. However, traditional SPC charts,
developed primarily for monitoring production lines in manufacturing, assume
independence and identical distribution of process observations when the process is
in-control (IC), and are designed for monitoring a single sequential process. In the
context of disease screening, observed data of a patient's disease risk factors would
rarely be independent and identically distributed over time and treating a patient's
observed data as a process introduces numerous processes of different patients, making
traditional SPC charts unsuitable to use.
Recognizing the importance of the disease screening problem, I dedicated much of the
past decade to addressing this issue. This endeavor led to the development of a series
of new concepts and methods by my research team. The central methodology, termed the
Dynamic Screening System (DySS), operates as follows: firstly, the regular
longitudinal pattern of disease risk factors is estimated from a pre-collected dataset
representing the population without the target disease. Subsequently, a patient's
observed pattern of disease risk factors is cross-sectionally compared with the
estimated regular longitudinal pattern at each observation time. The cumulative
difference between the two patterns up to the current time is then employed to
determine the patient's disease status at that time. DySS utilizes all historical
data of the patient in its decision-making, and effectively accommodates the complex
data structure, including time-varying data distribution.
In the summer of 2013, upon joining the University of Florida (UF), I started to
work on the pressing issue of disease surveillance due to its paramount importance
in public health. Disease incidence data are typically collected sequentially over
time and across multiple locations or regions, constituting spatio-temporal data.
Similar to disease screening, disease surveillance is a sequential decision-making
problem. However, its complexity arises from the intricate spatio-temporal data
structure, encompassing seasonality, temporal/spatial variation, data correlation,
and intricate data distribution. Common disease reporting and surveillance systems
incorporate conventional SPC charts such as the cumulative sum (CUSUM) and
exponentially weighted moving average (EWMA) charts. Additionally, retrospective
methods like scan tests and generalized linear modeling approaches are employed for
routine surveillance. Unfortunately, these methods often prove ineffective or
unreliable due to their inability to handle the sequential nature of the problem
or their restrictive model assumptions (cf., Section 2.7 and Chapters 7 and 8).
Over the past decade, my research team has devoted significant effort to this domain,
resulting in the development of several novel analytic methods for disease surveillance.
Our initial method operates as follows: First, a nonparametric spatio-temporal
modeling approach is employed to estimate the regular spatio-temporal pattern of
disease incidence rates from observed data in a baseline time interval (e.g., a
previous year without outbreaks). Second, the new spatial data collected at the
current time are compared with the estimated regular pattern and decorrelated with
all previous data. Third, an SPC chart is then applied to the decorrelated data to
determine the occurrence of a disease outbreak by the current time. Modified versions
of this method have been crafted to incorporate covariate information and accommodate
specific spatial features of disease outbreaks. These methods adeptly handle the
complex structure of observed data and have demonstrated effectiveness in disease
surveillance.
As discussed earlier, both disease screening and disease surveillance pose challenges
as sequential decision-making problems, and traditional SPC charts prove unreliable
in addressing them adequately. Consequently, disease screening and disease surveillance
emerge as crucial applications of SPC, demanding the development of new methods
tailored to their specific requirements. Fortuitously, my research journey in SPC
began in 1998, allowing me to contribute significantly to several key areas within
the field. Notable contributions include advancements in nonparametric process
monitoring (e.g., Qiu and Hawkins 2001, Qiu 2018), monitoring correlated data
(e.g., Qiu et al. 2020a, Xue and Qiu 2021), dynamic process monitoring (e.g., Qiu
and Xiang 2014, Xie and Qiu 2023a), profile monitoring (e.g., Qiu et al. 2010,
Zhou and Qiu 2022), and more. For a comprehensive description of SPC
and some SPC charts developed by my research group, see the book Qiu (2014).
This extensive experience has proven invaluable in my exploration of disease screening
and disease surveillance, providing a robust foundation to innovate and tailor SPC
methodologies to the distinctive challenges presented in these critical areas of
public health.
The book comprises nine chapters. In Chapter 1, a concise introduction sets the stage
for understanding the challenges posed by disease screening and surveillance problems.
Chapter 2 delves into fundamental statistical concepts and methods commonly employed
in data modeling and analysis. Given that disease screening and surveillance involve
sequential decision-making, Chapter 3 is dedicated to introducing essential SPC concepts
and methods -- a major statistical tool for such problems. Chapters 4-6 focus on recent
developments in DySS methods tailored for effective disease screening. Chapter 4 covers
univariate and multivariate DySS methods based on direct monitoring of observed disease
risk factors, while Chapter 5 introduces methods based on disease risk quantification
and sequential monitoring of quantified disease risks. The practical implementation of
DySS methods by the R package DySS is detailed in Chapter 6. Chapters 7-9 shift
the focus to disease surveillance. Chapter 7 explores traditional methods utilizing the
Knox test, scan statistics, and generalized linear modeling. Chapter 8 presents recent
methods developed by my research team based on nonparametric spatio-temporal data modeling
and monitoring. The implementation of these methods is demonstrated using the R package
SpTe2M in Chapter 9.
This book serves as an ideal primary textbook for a one-semester course focused on
disease screening and/or disease surveillance, tailored for graduate students in
biostatistics, bioinformatics, health data science, and related disciplines.
Additionally, the book can be utilized as a supplementary textbook for courses covering
analytic methods and tools relevant to medical and public health studies. Its content
is designed to be accessible and beneficial for medical and public health researchers
and practitioners. By introducing recent analytic tools for disease screening and
surveillance, the book equips readers with valuable insights that can be easily
implemented using the accompanying R packages DySS and SpTe2M.
I extend my sincere gratitude to my current and former students and collaborators, Drs.
Jun Li, Dongdong Xiang, Kai Yang, Lu You, and Jingnan Zhang, whose dedicated efforts,
stimulating discussions, and constructive comments have played an invaluable role in
the completion of this book. Their patience and insights have been indispensable.
I express my deep appreciation to Dr. Xiulin Xie and Mr. Zibo Tian, who generously
dedicated their time to reading the entire book manuscript and diligently corrected
numerous typos and mistakes. Completing this book has been a three-year journey, and
I owe a debt of gratitude to my wife, Yan, for providing unwavering help and support.
Her efforts in managing household responsibilities and caring for our two sons, Andrew
and Alan, allowed me to focus on this project. I extend my heartfelt thanks to my family
for their love and constant support throughout this endeavor.
Peihua Qiu
Gainesville, Florida
November 2023