Four Environment Variants. We consider two environment base models (stationary and nonstationary) and two effect sizes (population effect size, heterogeneous effect size).

Four Environment Variants. We consider two environment base models (stationary and nonstationary) and two effect sizes (population effect size, heterogeneous effect size).

Source publication
Article
Full-text available
Online reinforcement learning (RL) algorithms are increasingly used to personalize digital interventions in the fields of mobile health and online education. Common challenges in designing and testing an RL algorithm in these settings include ensuring the RL algorithm can learn and run stably under real-time constraints, and accounting for the comp...

Contexts in source publication

Context 1
... guideline 2, the variance of the normal distributions is found by again taking the absolute value of the weights of the base models fitted for each user, averaging the weights across features, and taking the empirical variance across users. In total, there are eight environment variants, which are summarized in Table 1. See Appendix A for further details regarding the development of the simulation environments. ...
Context 2
... means that baseline features only need to be available at update time and we can incorporate more features that were not available in real time at the decision time. Table 1. "k" refers to the cluster size. ...
Context 3
... that D i,t is the brush time in seconds for user i at decision time t. Definitions ofˆE ofˆ ofˆE[D i,t |S i,t ] for each model class are specified below in Table A1. Table A3 lists the number of model classes for all users in the ROBAS 2 study that we obtained after the procedure was run. ...

Citations

... BLR and approximate Bayesian algorithms, in general, can also obtain measures of confidence for the prediction and are thus well suited to inform decisions in just-in-time adaptive interventions (JITAIs), particularly those applying reinforcement learning (RL). JITAIs applying RL recently gained attention for their ability to explore and exploit sequential intervention decisions automatically [44,72,77]. Comparing churn prediction performances of BLR with commonly applied methods will help inform the potential of these models for churn prediction and prevention in self-adapting RL models. ...
... In this approach, the churn prediction model's risk score would inform the state of an RL agent. The RL agent then explores and exploits sequential churn interventions, with the policy updating based on user reactions to these interventions (e.g., a user login shortly after the intervention) and the intervention history, leading to more personalized and adaptive interventions [44,72,77]. Such advanced methodologies may incorporate Bayesian algorithms that can also obtain measures of confidence for the prediction. ...
Conference Paper
Full-text available
Digital health interventions (DHIs) offer promising solutions to the rising global challenges of noncommunicable diseases by promoting behavior change, improving health outcomes, and reducing healthcare costs. However, high churn rates are a concern with DHIs, with many users disengaging before achieving desired outcomes. Churn prediction can help DHI providers identify and retain at-risk users, enhancing the efficacy of DHIs. We analyzed churn prediction models for a weight loss app using various machine learning algorithms on data from 1,283 users and 310,845 event logs. The best-performing model, a random forest model that only used daily login counts, achieved an F1 score of 0.87 on day 7 and identified an average of 93% of churned users during the week-long trial. Notably, higher-dimensional models performed better at low false positive rate thresholds. Our findings suggest that user churn can be forecasted using engagement data, aiding in timely personalized strategies and better health results.
... Consistent with this, there is emerging interest in just-in-time adaptive interventions (JITAIs), or an intervention design that adapts support (e.g., type, timing, intensity) over time in response to an individual's changing status and context 19,20 . An efficient approach to achieve such personalized intervention is with the use of reinforcement learning 21,22 . This machine learning method trains a statistical model based on rewards from actions of the model in an environment. ...
... In the context of behavior change, the model observes individual behaviors in response to cues it provides (like text messages) and learns to optimize response (like adherence) through systematic trial-and-error 23,24 . This technique has technological underpinnings applied in computer gaming and robotics 21,[25][26][27] . In contrast to other approaches to achieving personalization, reinforcement learning uses approaches that predict the effectiveness of different intervention components and also can use latently derived estimates for tailoring (rather than end user input); and, as interventions are deployed, updates the predictions based on their successes and failures (both at the individual and group level) 28 . ...
Article
Full-text available
Text messaging can promote healthy behaviors, like adherence to medication, yet its effectiveness remains modest, in part because message content is rarely personalized. Reinforcement learning has been used in consumer technology to personalize content but with limited application in healthcare. We tested a reinforcement learning program that identifies individual responsiveness (“adherence”) to text message content and personalizes messaging accordingly. We randomized 60 individuals with diabetes and glycated hemoglobin A1c [HbA1c] ≥ 7.5% to reinforcement learning intervention or control (no messages). Both arms received electronic pill bottles to measure adherence. The intervention improved absolute adjusted adherence by 13.6% (95%CI: 1.7%–27.1%) versus control and was more effective in patients with HbA1c 7.5- < 9.0% (36.6%, 95%CI: 25.1%–48.2%, interaction p < 0.001). We also explored whether individual patient characteristics were associated with differential response to tested behavioral factors and unique clusters of responsiveness. Reinforcement learning may be a promising approach to improve adherence and personalize communication at scale.
... Fourth, more longitudinal studies are needed to understand the long-term effectiveness of DHIs in sustaining behavior change and what role adaptive interventions might play over time. Finally, future research should focus on how artificial intelligence and machine learning can be integrated into DHIs to provide personalized and adaptive interventions [121]. These technologies can also be used for real-time analysis of collected data to identify trends and patterns, predict outcomes, and provide feedback to the users or clinicians. ...
Article
Full-text available
Background: Despite an abundance of digital health interventions (DHIs) targeting the prevention and management of noncommunicable diseases (NCDs), it is unclear what specific components make a DHI effective. Purpose: This narrative umbrella review aimed to identify the most effective behavior change techniques (BCTs) in DHIs that address the prevention or management of NCDs. Methods: Five electronic databases were searched for articles published in English between January 2007 and December 2022. Studies were included if they were systematic reviews or meta-analyses of DHIs targeting the modification of one or more NCD-related risk factors in adults. BCTs were coded using the Behavior Change Technique Taxonomy v1. Study quality was assessed using AMSTAR 2. Results: Eighty-five articles, spanning 12 health domains and comprising over 865,000 individual participants, were included in the review. We found evidence that DHIs are effective in improving health outcomes for patients with cardiovascular disease, cancer, type 2 diabetes, and asthma, and health-related behaviors including physical activity, sedentary behavior, diet, weight management, medication adherence, and abstinence from substance use. There was strong evidence to suggest that credible source, social support, prompts and cues, graded tasks, goals and planning, feedback and monitoring, human coaching and personalization components increase the effectiveness of DHIs targeting the prevention and management of NCDs. Conclusions: This review identifies the most common and effective BCTs used in DHIs, which warrant prioritization for integration into future interventions. These findings are critical for the future development and upscaling of DHIs and should inform best practice guidelines.
... For adherence to guidelines, VMHA's task is to leverage questions in questionnaires such as PHQ-9 as knowledge and ensure that upcoming generated questions are similar or related to CPG questions. This can be achieved through metrics such as BERTScore (Lee et al., 2021), KL Divergence (Perez et al., 2022), and others, often used in a setup that uses reinforcement learning (Trella et al., 2022). ...
Article
Full-text available
Virtual Mental Health Assistants (VMHAs) continuously evolve to support the overloaded global healthcare system, which receives approximately 60 million primary care visits and 6 million emergency room visits annually. These systems, developed by clinical psychologists, psychiatrists, and AI researchers, are designed to aid in Cognitive Behavioral Therapy (CBT). The main focus of VMHAs is to provide relevant information to mental health professionals (MHPs) and engage in meaningful conversations to support individuals with mental health conditions. However, certain gaps prevent VMHAs from fully delivering on their promise during active communications. One of the gaps is their inability to explain their decisions to patients and MHPs, making conversations less trustworthy. Additionally, VMHAs can be vulnerable in providing unsafe responses to patient queries, further undermining their reliability. In this review, we assess the current state of VMHAs on the grounds of user-level explainability and safety, a set of desired properties for the broader adoption of VMHAs. This includes the examination of ChatGPT, a conversation agent developed on AI-driven models: GPT3.5 and GPT-4, that has been proposed for use in providing mental health services. By harnessing the collaborative and impactful contributions of AI, natural language processing, and the mental health professionals (MHPs) community, the review identifies opportunities for technological progress in VMHAs to ensure their capabilities include explainable and safe behaviors. It also emphasizes the importance of measures to guarantee that these advancements align with the promise of fostering trustworthy conversations.
... The above Roadmap 2.0 data appears ideal for constructing a simulation test bed for use in developing the RL algorithm for ADAPTS HCT in that Roadmap 2.0 concerns patients with cancer undergoing HCT and their caregivers. However, from the viewpoint of evaluating dyadic RL algorithms with the longer term goal of deploying dyadic RL in ADAPTS HCT, this data is impoverished [Trella et al., 2022a]. Roadmap 2.0 does not include daily intervention actions (i.e., whether to send a motivational prompts to the patient) nor weekly intervention actions (i.e., whether to encourage the dyad to play the joint game). ...
Preprint
Mobile health aims to enhance health outcomes by delivering interventions to individuals as they go about their daily life. The involvement of care partners and social support networks often proves crucial in helping individuals managing burdensome medical conditions. This presents opportunities in mobile health to design interventions that target the dyadic relationship -- the relationship between a target person and their care partner -- with the aim of enhancing social support. In this paper, we develop dyadic RL, an online reinforcement learning algorithm designed to personalize intervention delivery based on contextual factors and past responses of a target person and their care partner. Here, multiple sets of interventions impact the dyad across multiple time intervals. The developed dyadic RL is Bayesian and hierarchical. We formally introduce the problem setup, develop dyadic RL and establish a regret bound. We demonstrate dyadic RL's empirical performance through simulation studies on both toy scenarios and on a realistic test bed constructed from data collected in a mobile health study.
... Participants in BWL-AI receive 1 month of weekly group sessions, followed by 11 months where they receive 1 of 3 possible interventions each week: a small videoconference group with a master's degree-level counselor, a 12-minute individual video call with the master's degree-level counselor or paraprofessional, or an automated message ("coaching message"). Within the BWL-AI group, weekly intervention assignment is fully automated by an algorithm that uses the AI technique of reinforcement learning [31,32]. In brief, this algorithm continuously monitors and models participants' digital data (eg, weight and physical activity) to predict the intervention assignment most likely to maximize the collective amount of weight loss across the BWL-AI condition in a given week. ...
Article
Full-text available
Background Mobile health interventions for weight loss frequently use automated messaging. However, this intervention modality appears to have limited weight loss efficacy. Furthermore, data on users’ subjective experiences while receiving automated messaging–based interventions for weight loss are scarce, especially for more advanced messaging systems providing users with individually tailored, data-informed feedback. Objective The purpose of this study was to characterize the experiences of individuals with overweight or obesity who received automated messages for 6-12 months as part of a behavioral weight loss trial. Methods Participants (n=40) provided Likert-scale ratings of messaging acceptability and completed a structured qualitative interview (n=39) focused on their experiences with the messaging system and generating suggestions for improvement. Interview data were analyzed using thematic analysis. Results Participants found the messages most useful for summarizing goal progress and least useful for suggesting new behavioral strategies. Overall message acceptability was moderate (2.67 out of 5). From the interviews, 2 meta-themes emerged. Participants indicated that although the messages provided useful reminders of intervention goals and skills, they did not adequately capture their lived experiences while losing weight. Conclusions Many participants found the automated messages insufficiently tailored to their personal weight loss experiences. Future studies should explore alternative methods for message tailoring (eg, allowing for a higher degree of participant input and interactivity) that may boost treatment engagement and efficacy. Trial Registration ClinicalTrials.gov NCT05231824; https://clinicaltrials.gov/study/NCT05231824
... With their complementary strengths, concurrently leveraging ecological momentary assessment with wearable devices equipped with accelerometers and/or heart rate monitors would capture self-reported contextual, psychological, and behavioural data with data-driven estimates of energy expenditure to more holistically describe exercise experiences (Dunton, 2017;Kanning et al., 2013;Pettee Gabriel et al., 2012;Strohacker et al., 2022). In health care settings, forms of machine learning (e.g., deep learning, reinforcement learning) are being used to model multivariate, time-series data to develop decision-making algorithms based on iterative inputs (clinical effects, patient history and current state, environment; Murphy et al., 2007;Norgeot et al., 2019;Trella et al., 2022). Because these approaches yield a degree of uncertainty in prediction (due to missing data, small sample sizes, or limited within-person data points), Bayesian approaches can estimate probability distributions for treatment effects and the extent of uncertainty, by synthesising quantitative and qualitative data (Amirova et al., 2022). ...
Article
There is a growing focus on developing person-adaptive strategies to support sustained exercise behavior, necessitating conceptual models to guide future research and applications. This paper introduces Flexible nonlinear periodization (FNLP) - a proposed, but underdeveloped person-adaptive model originating in sport-specific conditioning - that, pending empirical refinement and evaluation, may be applied in health promotion and disease prevention settings. To initiate such efforts, the procedures of FNLP (i.e., acutely and dynamically matching exercise demand to individual assessments of mental and physical readiness) are integrated with contemporary health behavior evidence and theory to propose a modified FNLP model and to show hypothesized pathways by which FNLP may support exercise adherence (e.g., flexible goal setting, management of affective responses, and provision of autonomy/variety-support). Considerations for future research are also provided to guide iterative, evidence-based efforts for further development, acceptability, implementation, and evaluation.
... Changing the RL algorithm once the study has begun jeopardizes trial validity. So it is critical that the RL algorithm run stably (e.g., the algorithm updates quickly enough to select actions each day) (Trella et al. 2022). ...
... Algorithms that learn using all users' data can learn faster than those that learn using a single user's data. Moreover, in early experiments, BLR with full pooling performed better than BLR that learns using only a single user's data or a smaller cluster of users' data (Trella et al. 2022). In Equation (1), the reward model parameters are not indexed by the user i to reflect how we are learning a single RL algorithm for all users in the study. ...
... Aimed at behavioral change, many mobile health studies use contextual bandit algorithms to optimize interventions under real-world constraints as discussed in Trella et al. (2022) and Figueroa et al. (2021). Alternative approaches include Liao et al. (2019), which uses a modified version of a posterior sampling contextual bandit algorithm; Wang et al. (2021), which uses an MDP framework and learns an action selection strategy offline; and Zhou et al. (2018), which uses inverse RL to estimate a model of the reward and mixedinteger linear programming to select a 7-day schedule of personal step count goals for each user. ...
Article
While dental disease is largely preventable, professional advice on optimal oral hygiene practices is often forgotten or abandoned by patients. Therefore patients may benefit from timely and personalized encouragement to engage in oral self-care behaviors. In this paper, we develop an online reinforcement learning (RL) algorithm for use in optimizing the delivery of mobile-based prompts to encourage oral hygiene behaviors. One of the main challenges in developing such an algorithm is ensuring that the algorithm considers the impact of current actions on the effectiveness of future actions (i.e., delayed effects), especially when the algorithm has been designed to run stably and autonomously in a constrained, real-world setting characterized by highly noisy, sparse data. We address this challenge by designing a quality reward that maximizes the desired health outcome (i.e., high-quality brushing) while minimizing user burden. We also highlight a procedure for optimizing the hyperparameters of the reward by building a simulation environment test bed and evaluating candidates using the test bed. The RL algorithm discussed in this paper will be deployed in Oralytics. To the best of our knowledge, Oralytics is the first mobile health study utilizing an RL algorithm designed to prevent dental disease by optimizing the delivery of motivational messages supporting oral self-care behaviors.
... Statistical analysis and machine learning (ML) models play a crucial role in using the information available to understand and predict individual FHW or patient behavior in order to design the best interventions for each use case (Hosny & Aerts, 2019;Wahl et al., 2018). Reinforcement learning (RL), a subset of these methodologies, additionally provides the algorithmic framework for making decisions about when and which interventions to send (Yom-Tov et al., 2017;Forman et al., 2019;Liao et al., 2020;Wang et al., 2021;Trella et al., 2022). It allows us to fine-tune the system depending on how much weight on knowledge extraction (i.e., gathering data that will allow statistical inference with enough power) or on optimization (i.e., using all information to make the best choice for each individual even if it hinders statistical significance) we want to put. ...
... It uses HealthSyn, an open-source library to simulate the behavior of FHWs in digital tools, and HealthKit, an open-source SDK which can track and label their logs in a data schema optimized for ML. A data science framework for the design of RL algorithms for digital interventions is presented in Trella et al. (2022), Wang et al. (2021) uses a data-driven behavioral simulator (trained using real data) to model the user's behavior and generate simulated data that can be used to train and evaluate the RL algorithm specifically for mobile health. ...
Preprint
Full-text available
Artificial Intelligence and digital health have the potential to transform global health. However, having access to representative data to test and validate algorithms in realistic production environments is essential. We introduce HealthSyn, an open-source synthetic data generator of user behavior for testing reinforcement learning algorithms in the context of mobile health interventions. The generator utilizes Markov processes to generate diverse user actions, with individual user behavioral patterns that can change in reaction to personalized interventions (i.e., reminders, recommendations, and incentives). These actions are translated into actual logs using an ML-purposed data schema specific to the mobile health application functionality included with HealthKit, and open-source SDK. The logs can be fed to pipelines to obtain user metrics. The generated data, which is based on real-world behaviors and simulation techniques, can be used to develop, test, and evaluate, both ML algorithms in research and end-to-end operational RL-based intervention delivery frameworks.
... Mobile health clinical trials can require years of work by an interdisciplinary team to develop and the RL algorithm cannot be trivially changed once the study begins. So it is critical that the RL algorithm runs stably (e.g., the algorithm cannot take too long to update and fail to be ready to select actions each day) (Trella et al. 2022). ...
... Algorithms that learn using the data of all users have the potential to learn faster than those that learn using the data of a single user. Moreover, in early experiments, BLR with full pooling performed better than BLR that learns using only a single user's data or a smaller cluster of users' data (Trella et al. 2022). Note that in Equation (1), the reward model parameters are not indexed by the user i to reflect how we are learning a single RL algorithm for all users in the study. ...
... There RL algorithms are increasingly used in mobile health studies, for example, in studies to increase users' physical activity (Yom-Tov et al. 2017;Figueroa et al. 2021;Liao et al. 2019), studies to promote users' weight loss (Forman et al. 2019), and studies to help users' manage mental illness (Piette et al. 2022). All these works use contextual bandit RL algorithms for optimizing intervention delivery under real-world constraints discussed in Trella et al. (2022) and Figueroa et al. (2021). The only exception is the physical activity study of Liao et al. (2019), which used a modified version of a posterior sampling contextual bandit RL algorithm. ...
Preprint
Full-text available
Dental disease is one of the most common chronic diseases despite being largely preventable. However, professional advice on optimal oral hygiene practices is often forgotten or abandoned by patients. Therefore patients may benefit from timely and personalized encouragement to engage in oral self-care behaviors. In this paper, we develop an online reinforcement learning (RL) algorithm for use in optimizing the delivery of mobile-based prompts to encourage oral hygiene behaviors. One of the main challenges in developing such an algorithm is ensuring that the algorithm considers the impact of the current action on the effectiveness of future actions (i.e., delayed effects), especially when the algorithm has been made simple in order to run stably and autonomously in a constrained, real-world setting (i.e., highly noisy, sparse data). We address this challenge by designing a quality reward which maximizes the desired health outcome (i.e., high-quality brushing) while minimizing user burden. We also highlight a procedure for optimizing the hyperparameters of the reward by building a simulation environment test bed and evaluating candidates using the test bed. The RL algorithm discussed in this paper will be deployed in Oralytics, an oral self-care app that provides behavioral strategies to boost patient engagement in oral hygiene practices.