Fig 2 - available via license: CC BY
Content may be subject to copyright.
Examples of data manipulation that can introduce bias to the derived variables. Note. (a) this shows the relationship between an input continuous variable and the derived dichotomized variable. The observations were sorted by increasing order based on the values of the continuous variable, the frailty index in the Burden Model in this case. The horizontal axis was the number of the observations. The vertical axis was the value of the continuous index. The bias variable was the essential information that was not related to original input variable. For example, one observation had a value of 0.1 and the other had 0.9, while the cut-off threshold was 0.2. The values of the bias variable required for the two observations were-0.1 and 0.1 respectively to derive their statuses, non-frail (coded as zero) and frail (coded as one). (b) the sensory problem was defined by having problem of hearing or eyesight. The observations were sorted by having problem of hearing and eyesight. Those having problem of hearing and eyesight needed to be subtracted by one for having two problems. The negative values assigned to those having two problems were the bias variable generated. 

Examples of data manipulation that can introduce bias to the derived variables. Note. (a) this shows the relationship between an input continuous variable and the derived dichotomized variable. The observations were sorted by increasing order based on the values of the continuous variable, the frailty index in the Burden Model in this case. The horizontal axis was the number of the observations. The vertical axis was the value of the continuous index. The bias variable was the essential information that was not related to original input variable. For example, one observation had a value of 0.1 and the other had 0.9, while the cut-off threshold was 0.2. The values of the bias variable required for the two observations were-0.1 and 0.1 respectively to derive their statuses, non-frail (coded as zero) and frail (coded as one). (b) the sensory problem was defined by having problem of hearing or eyesight. The observations were sorted by having problem of hearing and eyesight. Those having problem of hearing and eyesight needed to be subtracted by one for having two problems. The negative values assigned to those having two problems were the bias variable generated. 

Source publication
Article
Full-text available
Introduction Frailty is a geriatric syndrome that has been defined differently with various indices. Without a uniform definition, it remains unclear how to interpret and compare different frailty indices (FIs). With the advances in index mining, we find it necessary to review the implicit assumptions about the creation of FIs. We are concerned the...

Citations

... However, in many cases, the scales are designed to be measured over a continuous or ordinal range. According to Chao et al., the scales tested as continuous variables better predict the mortality in the age group over 50, while dichotomisation of variables results in a risk of bias and loss of many variables that remain on the spectrum between categories [24]. In 2013, Romero-Ortuno proposed matching the Frailty Index cut-off points with the patient's age, proving that this scale performs differently in different age groups [25]. ...
Article
Full-text available
Background: Despite the common occurrence of postoperative complications in patients with frailty syndrome, the nature and severity of this relationship remains unclear. We aimed to assess the association of frailty with possible postoperative complications after elective, abdominal surgery in participants of a single-centre prospective study in relation to other risk classification methods. Methods: Frailty was assessed preoperatively using the Edmonton Frail Scale (EFS), Modified Frailty Index (mFI) and Clinical Frailty Scale (CFS). Perioperative risk was assessed using the American Society of Anesthesiology Physical Status (ASA PS), Operative Severity Score (OSS) and Surgical Mortality Probability Model (S-MPM). Results: The frailty scores failed to predict in-hospital complications. The values of AUCs for in-hospital complications ranged between 0.5 and 0.6 and were statistically nonsignificant. The perioperative risk measuring system performance in ROC analysis was satisfactory with AUC ranging from 0.63 for OSS to 0.65 for S-MPM (p < 0.05 for each). Conclusions: The analysed frailty rating scales proved to be poor predictors of postoperative complications in the studied population. Scales assessing perioperative risk performed better. Further studies are needed to obtain optimal predictive tools in senior patients undergoing surgery.
... The RR is one of the relative measures widely used to quantify disease risks [1,2]. In particular, it is used to quantify the impacts of health hazards on the occurrence of diseases, compared with those not exposed to the hazards [1,3,4]. Among relative measures, RRs are simple to interpret and often used in randomized controlled trials and cohort studies [1,4,5]. ...
Article
Full-text available
Background Relative measures, including risk ratios (RRs) and odds ratios (ORs), are reported in many epidemiological studies. RRs represent how many times a condition is likely to develop when exposed to a risk factor. The upper limit of RRs is the multiplicative inverse of the baseline incidence. Ignoring the upper limits of RRs can lead to reporting exaggerated relative effect sizes. Objectives This study aims to demonstrate the importance of such upper limits for effect size reporting via equations, examples, and simulations and provide recommendations for the reporting of relative measures. Methods Equations to calculate RRs and their 95% confidence intervals (CIs) were listed. We performed simulations with 10,000 simulated subjects and three population variables: proportions at risk (0.05, 0.1, 0.3, 0.5, and 0.8), baseline incidence (0.05, 0.1, 0.3, 0.5, and 0.8), and RRs (0.5, 1.0, 5.0, 10.0, and 25.0). Subjects were randomly assigned with a risk based on the set of proportions-at-risk values. A disease occurred based on the baseline incidence among those not at risk. The incidence of those at risk was the product of the baseline incidence and the RRs. The 95% CIs of RRs were calculated according to Altman. Results The calculation of RR 95% CIs is not connected to the RR upper limits in equations. The RRs in the simulated populations at risk could reach the upper limits of RRs: multiplicative inverse of the baseline incidence. The upper limits to the derived RRs were around 1.25, 2, 3.3, 10, and 20, when the assumed baseline incidence rates were 0.8, 0.5, 0.3, 0.2, and 0.05, respectively. We demonstrated five scenarios in which the RR 95% CIs might exceed the upper limits. Conclusions Statistical significance does not imply the RR 95% CIs not exceeding the upper limits of RRs. When reporting RRs or ORs, the RR upper limits should be assessed. The rate ratio is also subject to a similar upper limit. In the literature, ORs tend to overestimate effect sizes. It is recommended to correct ORs that aim to approximate RRs assuming outcomes are rare. A reporting guide for relative measures, RRs, ORs, and rate ratios, is provided. Researchers are recommended to report whether the 95% CIs of relative measures, RRs, ORs, and rate ratios, overlap with the range of upper limits and discuss whether the relative measure estimates may exceed the upper limits.
... Currently, there is no critical appraisal tool specifically designed for innovative composite measures. Composite measures are often derived from multiple variables and can be used as diagnoses, prognostic factors, or outcomes in clinical and health research [1,2]. For example, frailty is a diagnosis that can be confirmed based on the number of age-related symptoms and has been used to predict major health outcomes, such as mortality and falls [1]. ...
... Composite measures are often derived from multiple variables and can be used as diagnoses, prognostic factors, or outcomes in clinical and health research [1,2]. For example, frailty is a diagnosis that can be confirmed based on the number of age-related symptoms and has been used to predict major health outcomes, such as mortality and falls [1]. The Charlson Comorbidity Index is another example and uses patients' ages and comorbidities to calculate points that can be converted to the probabilities of survival in 10 years [2]. ...
... The Charlson Comorbidity Index is another example and uses patients' ages and comorbidities to calculate points that can be converted to the probabilities of survival in 10 years [2]. Composite measures aggregate information from multiple variables to represent certain concepts that are often difficult to quantify with single variables [1,2]. Indices, scales, and typologies are different forms of composite measures designed for specific contexts [3]. ...
Article
Full-text available
Background Composite measures are often used to represent certain concepts that cannot be measured with single variables and can be used as diagnoses, prognostic factors, or outcomes in clinical or health research. For example, frailty is a diagnosis confirmed based on the number of age-related symptoms and has been used to predict major health outcomes. However, undeclared assumptions and problems are prevalent among composite measures. Thus, we aim to propose a reporting guide and an appraisal tool for identifying these assumptions and problems. Methods We developed this reporting and assessment tool based on evidence and the consensus of experts pioneering research on index mining and syndrome mining. We designed a development framework for composite measures and then tested and revised it based on several composite measures commonly used in medical research, such as frailty, body mass index (BMI), mental illness diagnoses, and innovative indices mined for mortality prediction. We extracted review questions and reporting items from various issues identified by the development framework. This panel reviewed the identified issues, considered other aspects that might have been neglected in previous studies, and reached a consensus on the questions to be used by the reporting and assessment tool. Results We selected 19 questions in seven domains for reporting or critical assessment. Each domain contains review questions for authors and readers to critically evaluate the interpretability and validity of composite measures, which include candidate variable selection, variable inclusion and assumption declaration, data processing, weighting scheme, methods to aggregate information, composite measure interpretation and justification, and recommendations on the use. Conclusions For all seven domains, interpretability is central with respect to composite measures. Variable inclusion and assumptions are important clues to show the connection between composite measures and their theories. This tool can help researchers and readers understand the appropriateness of composite measures by exploring various issues. We recommend using this Critical Hierarchical Appraisal and repOrting tool for composite measureS (CHAOS) along with other critical appraisal tools to evaluate study design or risk of bias.
... Frailty is a syndrome and can be diagnosed with composite criteria that consist of various frailty symptoms [1][2][3]. Frailty is often characterized by age-related symptoms, such as declines in physical and cognitive functioning. It has been considered significant for the prediction of major health outcomes, such as falls, surgical outcomes, and mortality [2,3]. ...
... Frailty is often characterized by age-related symptoms, such as declines in physical and cognitive functioning. It has been considered significant for the prediction of major health outcomes, such as falls, surgical outcomes, and mortality [2,3]. By aggregating information from multiple symptoms, frailty index scores can be assigned to individuals [2,3]. ...
... It has been considered significant for the prediction of major health outcomes, such as falls, surgical outcomes, and mortality [2,3]. By aggregating information from multiple symptoms, frailty index scores can be assigned to individuals [2,3]. Frailty status can then be derived by applying theoretical thresholds to frailty index scores [2,3]. ...
Article
Full-text available
Background Frailty is associated with major health outcomes. However, the relationships between frailty and frailty symptoms haven’t been well studied. This study aims to show the associations between frailty and frailty symptoms. Methods The Health and Retirement Study (HRS) is an ongoing longitudinal biannual survey in the United States. Three of the most used frailty diagnoses, defined by the Functional Domains Model, the Burden Model, and the Biologic Syndrome Model, were reproduced according to previous studies. The associations between frailty statuses and input symptoms were assessed using odds ratios and correlation coefficients. Results The sample sizes, mean ages, and frailty prevalence matched those reported in previous studies. Frailty statuses were weakly correlated with each other (coefficients = 0.19 to 0.38, p < 0.001 for all). There were 49 input symptoms identified by these three models. Frailty statuses defined by the three models were not significantly correlated with one or two symptoms defined by the same models (p > 0.05 for all). One to six symptoms defined by the other two models were not significantly correlated with each of the three frailty statuses (p > 0.05 for all). Frailty statuses were significantly correlated with their own bias variables (p < 0.05 for all). Conclusion Frailty diagnoses lack significant correlations with some of their own frailty symptoms and some of the frailty symptoms defined by the other two models. This finding raises questions like whether the frailty symptoms lacking significant correlations with frailty statuses could be included to diagnose frailty and whether frailty exists and causes frailty symptoms.
... How to construct an optimal FI for dementia risk prediction purposes remains unknown. One caveat is that the standard procedure lacks criteria for discarding deficits with little explained variance, which may reduce FI performance [14]. ...
Article
Full-text available
Frailty is a dementia risk factor commonly measured by a frailty index (FI). The standard procedure for creating an FI requires manually selecting health deficit items and lacks criteria for selection optimization. We hypothesized that refining the item selection using data-driven assessment improves sensitivity to cognitive status and future dementia conversion, and compared the predictive value of three FIs: a standard 93-item FI was created after selecting health deficit items according to standard criteria (FI s ) from the ADNI database. A refined FI (FI r ) was calculated by using a subset of items, identified using factor analysis of mixed data (FAMD)-based cluster analysis. We developed both FIs for the ADNI1 cohort ( n = 819). We also calculated another standard FI (FI c ) developed by Canevelli and coworkers. Results were validated in an external sample by pooling ADNI2 and ADNI-GO cohorts ( n = 815). Cluster analysis yielded two clusters of subjects, which significantly (p FDR < .05) differed on 26 health items, which were used to compute FI r . The data-driven subset of items included in FI r covered a range of systems and included well-known frailty components, e.g., gait alterations and low energy. In prediction analyses, FI r outperformed FI s and FI c in terms of baseline cognition and future dementia conversion in the training and validation cohorts. In conclusion, the data show that data-driven health deficit assessment improves an FI's prediction of current cognitive status and future dementia, and suggest that the standard FI procedure needs to be refined when used for dementia risk assessment purposes.
... Selected characteristics of the 8 included studies [32][33][34][35][36][37][38][39] are presented in Table 1. The number of participants in the individual studies varied from 909 to 7713, and their mean age varied from 69.4 to 81.1 years. ...
... The duration of follow-up for all-cause mortality of the included AUC estimates varied from 2 to 7 years. Most studies involved participants living in Europe (N=3) [33,34,37] or Australia (N=2) [38,39], and the remaining 3 studies involved participants living in the USA [32], China [35] or multiple diverse populations in Europe, North America and Australia [36]. ...
... According to the SIGN checklist, 3 reports were rated as having a 'high quality (++)' [33,36,39], 3 had 'acceptable quality (+)' [32,35,37] and 2 had a 'low-quality score (0)' [34,38]. The risk of bias chiefly reflected uncertainty about the response rates and loss to follow-up by levels of frailty (Additional file 1: Table S5). ...
Article
Full-text available
Background Current guidelines for healthcare of community-dwelling older people advocate screening for frailty to predict adverse health outcomes, but there is no consensus on the optimum instrument to use in such settings. The objective of this systematic review of population studies was to compare the ability of the frailty index (FI) and frailty phenotype (FP) instruments to predict all-cause mortality in older people. Methods Studies published before 27 July 2022 were identified using Ovid MEDLINE, Embase, Scopus, Web of Science and CINAHL databases. The eligibility criteria were population-based prospective studies of community-dwelling older adults (aged 65 years or older) and evaluation of both the FI and FP for prediction of all-cause mortality. The Scottish Intercollegiate Guidelines Network’s Methodology checklist was used to assess study quality. The areas under the receiver operator characteristic curves (AUC) were compared, and the proportions of included studies that achieved acceptable discriminatory power (AUC > 0.7) were calculated for each frailty instrument. The results were stratified by the use of continuous or categorical formats of each instrument. The review was reported in accordance with the PRISMA and SWiM guidelines. Results Among 8 studies (range: 909 to 7713 participants), both FI and FP had comparable predictive power for all-cause mortality. The AUC values ranged from 0.66 to 0.84 for FI continuous, 0.60 to 0.80 for FI categorical, 0.63 to 0.80 for FP continuous and 0.57 to 0.79 for FP categorical. The proportion of studies achieving acceptable discriminatory power were 75%, 50%, 63%, and 50%, respectively. The predictive ability of each frailty instrument was unaltered by the number of included items. Conclusions Despite differences in their content, both the FI and FP instruments had modest but comparable ability to predict all-cause mortality. The use of continuous rather than categorical formats in either instrument enhanced their ability to predict all-cause mortality.
... Implications for the use of diagnostic criteria. Currently the diagnosis of many conditions, such as mental illnesses 2,30 and frailty indices 1,9 , are based on composite diagnostic criteria. Both mental illnesses and frailty indices use symptoms to confirm diagnoses 1,2 . ...
... However, the simulations are not likely to match the complex multi-cause examples commonly seen in the real world. For example, the symptoms of frailty, a geriatric syndrome, can be linked to frailty and many other causes 1,6 www.nature.com/scientificreports/ due to the random assignment to different simulated populations. ...
Article
Full-text available
Symptoms have been used to diagnose conditions such as frailty and mental illnesses. However, the diagnostic accuracy of the numbers of symptoms has not been well studied. This study aims to use equations and simulations to demonstrate how the factors that determine symptom incidence influence symptoms’ diagnostic accuracy for disease diagnosis. Assuming a disease causing symptoms and correlated with the other disease in 10,000 simulated subjects, 40 symptoms occurred based on 3 epidemiological measures: proportions diseased, baseline symptom incidence (among those not diseased), and risk ratios. Symptoms occurred with similar correlation coefficients. The sensitivities and specificities of single symptoms for disease diagnosis were exhibited as equations using the three epidemiological measures and approximated using linear regression in simulated populations. The areas under curves (AUCs) of the receiver operating characteristic (ROC) curves was the measure to determine the diagnostic accuracy of multiple symptoms, derived by using 2 to 40 symptoms for disease diagnosis. With respect to each AUC, the best set of sensitivity and specificity, whose difference with 1 in the absolute value was maximal, was chosen. The results showed sensitivities and specificities of single symptoms for disease diagnosis were fully explained with the three epidemiological measures in simulated subjects. The AUCs increased or decreased with more symptoms used for disease diagnosis, when the risk ratios were greater or less than 1, respectively. Based on the AUCs, with risk ratios were similar to 1, symptoms did not provide diagnostic values. When risk ratios were greater or less than 1, maximal or minimal AUCs usually could be reached with less than 30 symptoms. The maximal AUCs and their best sets of sensitivities and specificities could be well approximated with the three epidemiological and interaction terms, adjusted R-squared ≥ 0.69. However, the observed overall symptom correlations, overall symptom incidence, and numbers of symptoms explained a small fraction of the AUC variances, adjusted R-squared ≤ 0.03. In conclusion, the sensitivities and specificities of single symptoms for disease diagnosis can be explained fully by the at-risk incidence and the 1 minus baseline incidence, respectively. The epidemiological measures and baseline symptom correlations can explain large fractions of the variances of the maximal AUCs and the best sets of sensitivities and specificities. These findings are important for researchers who want to assess the diagnostic accuracy of composite diagnostic criteria.
... In addition, symptom-based diagnostic criteria are composite measures subject to problems that undermine their validity (9)(10)(11). The diagnoses of three common mental illnesses, dysthymic disorder, major depressive episodes (for the diagnosis of major depressive disorder or bipolar disorder according to the DSM, 4th edition, text revision [DSM-IV-TR]), and manic episodes (for the diagnosis of bipolar disorder), are, in fact, complicated mathematical equations that use data processing procedures that introduce biases into the diagnoses (9). ...
... In contrast, when symptoms are summed together as a diagnosis for outcome prediction, the regression coefficients of the input symptoms can in fact be represented by the coefficient of the diagnosis (44,45). This strategy assumes that the effect sizes of these items are the same for various outcomes (10). Imposing such assumptions on medical diagnoses is imposing restrictions on the relationships between symptoms, which can lead to indices or diagnoses that fail to predict major outcomes, particularly mortality, more accurately than their input symptoms or the biases generated by inadequate data processing (10,11). ...
... This strategy assumes that the effect sizes of these items are the same for various outcomes (10). Imposing such assumptions on medical diagnoses is imposing restrictions on the relationships between symptoms, which can lead to indices or diagnoses that fail to predict major outcomes, particularly mortality, more accurately than their input symptoms or the biases generated by inadequate data processing (10,11). ...
Article
Full-text available
Background Mental illness diagnostic criteria are made based on assumptions. This pilot study aims to assess the public’s perspectives on mental illness diagnoses and these assumptions. Methods An anonymous survey with 30 questions was made available online in 2021. Participants were recruited via social media, and no personal information was collected. Ten questions focused on participants’ perceptions regarding mental illness diagnoses, and 20 questions related to the assumptions of mental illness diagnoses. The participants’ perspectives on these assumptions held by professionals were assessed. Results Among 14 survey participants, 4 correctly answered the relationships of 6 symptom pairs (28.57%). Two participants could not correctly conduct the calculations involved in mood disorder diagnoses (14.29%). Eleven (78.57%) correctly indicated that 2 or more sets of criteria were available for single diagnoses of mental illnesses. Only 1 (7.14%) correctly answered that the associations between symptoms and diagnoses were supported by including symptoms in the diagnostic criteria of the diagnoses. Nine (64.29%) correctly answered that the diagnosis variances were not fully explained by their symptoms. The confidence of participants in the major depressive disorder diagnosis and the willingness to take medications for this diagnosis were the same (mean = 5.50, standard deviation [SD] = 2.31). However, the confidence of participants in the symptom-based diagnosis of non-solid brain tumor was significantly lower (mean = 1.62, SD = 2.33, p < 0.001). Conclusion Our study found that mental illness diagnoses are wrong from the perspectives of the public because our participants did not agree with all the assumptions professionals make about mental illness diagnoses. Only a minority of our participants obtained correct answers to the calculations involved in mental illness diagnoses. In the literature, neither patients nor the public have been engaged in formulating the diagnostic criteria of mental illnesses.
... The frailty index, which attempts to operationalize this increasing state of vulnerability by adding up a variety of pathological "deficits" across multiple biological and physiological systems [3], has been shown not only to fulfill theoretical assumptions on how damage within a complex biological network might accumulate [4], but also to reliably predict numerous age-related outcomes such as cardiovascular disease risk [5], depression [6], post-operative recovery [7] and death [8,9]. However, the frailty index has also been criticized with regard to the length of time required to implement, the vast number of deficits required to achieve a reliable score, the use of inherently biased self-reported data and how deficits are treated when deriving a score (i.e., unweighted and often dichotomously) [10]. ...
... The primary goal of the current study was to evaluate a series of conceptually diverse epigenetic clock measures with regard to their association to frailty and its change over 3 years. While the frailty index is an excellent predictor of adverse health outcomes in a variety of settings [5][6][7][8][9], it has also been criticized for being cumbersome and inherently biased [10]; hence, identifying standardized molecular measures that are indicative of its change, especially over relatively short intervals, would be of certain value. In our sample of the CLSA, the change in frailty over 3 years was normally distributed, on average increasing about 20% (i.e., 0.006) of what has been previously described as a clinically meaningful difference (i.e., 0.03) [34,35]. ...
Article
Full-text available
Background The trajectory of frailty in older adults is important to public health; therefore, markers that may help predict this and other important outcomes could be beneficial. Epigenetic clocks have been developed and are associated with various health-related outcomes and sociodemographic factors, but associations with frailty are poorly described. Further, it is uncertain whether newer generations of epigenetic clocks, trained on variables other than chronological age, would be more strongly associated with frailty than earlier developed clocks. Using data from the Canadian Longitudinal Study on Aging (CLSA), we tested the hypothesis that clocks trained on phenotypic markers of health or mortality (i.e., Dunedin PoAm, GrimAge, PhenoAge and Zhang in Nat Commun 8:14617, 2017) would best predict changes in a 76-item frailty index (FI) over a 3-year interval, as compared to clocks trained on chronological age (i.e., Hannum in Mol Cell 49:359–367, 2013, Horvath in Genome Biol 14:R115, 2013, Lin in Aging 8:394–401, 2016, and Yang Genome Biol 17:205, 2016). Results We show that in 1446 participants, phenotype/mortality-trained clocks outperformed age-trained clocks with regard to the association with baseline frailty (mean = 0.141, SD = 0.075), the greatest of which is GrimAge, where a 1-SD increase in ΔGrimAge (i.e., the difference from chronological age) was associated with a 0.020 increase in frailty (95% CI 0.016, 0.024), or ~ 27% relative to the SD in frailty. Only GrimAge and Hannum (Mol Cell 49:359–367, 2013) were significantly associated with change in frailty over time, where a 1-SD increase in ΔGrimAge and ΔHannum 2013 was associated with a 0.0030 (95% CI 0.0007, 0.0050) and 0.0028 (95% CI 0.0007, 0.0050) increase over 3 years, respectively, or ~ 7% relative to the SD in frailty change. Conclusion Both prevalence and change in frailty are associated with increased epigenetic age. However, not all clocks are equally sensitive to these outcomes and depend on their underlying relationship with chronological age, healthspan and lifespan. Certain clocks were significantly associated with relatively short-term changes in frailty, thereby supporting their utility in initiatives and interventions to promote healthy aging.
... In contrast, several of them could not be directly measured or quantified based on single variables. For example, frailty is theorized as a geriatric syndrome that can be defined by a composition of variables, from four to 70 depending on the theories used to support the definitions (1)(2)(3)(4)(5)(6). Frailty is often calculated on a continuous scale and dichotomized to derive frailty status (1-3, 5, 7-11). ...
... It has also been associated with a variety of outcomes, such as falls and mortality (1-5, 8, 10, 11). In addition to serving as proxy measures for health status, frailty itself has been used as an outcome of interventions (1)(2)(3)(8)(9)(10)(11)(12). One of the reasons is that frailty status has been linked to pathological changes, such as sarcopenia (13). ...
... One of the reasons is that frailty status has been linked to pathological changes, such as sarcopenia (13). Frailty status has been conceived as an opportunity to shift the aging trajectory and is actively used in various trials (1). ...
Article
Full-text available
Background: There are clinical trials using composite measures, indices, or scales as proxy for independent variables or outcomes. Interpretability of derived measures may not be satisfying. Adopting indices of poor interpretability in clinical trials may lead to trial failure. This study aims to understand the impact of using indices of different interpretability in clinical trials. Methods: The interpretability of indices was categorized as: fair-to-poor, good, and unknown. In the literature, frailty indices were considered fair to poor interpretability. Body mass index (BMI) was highly interpretable. The other indices were of unknown interpretability. The trials were searched at clinicaltrials.gov on October 2, 2018. The use of indices as conditions/diseases or other terms was searched. The trials were grouped as completed, terminated, active, and other status. We tabulated the frequencies of frailty, BMI, and other indices. Results: There were 263,928 clinical trials found and 155,606 were completed or terminated. Among 2,115 trials adopting indices or composite measures as condition or disease, 244 adopted frailty and 487 used BMI without frailty indices. Significantly higher proportions of trials of unknown status used indices as conditions/diseases or other terms, compared to completed and terminated trials. The proportions of active trials using frailty indices were significantly higher than those of completed or terminated trials. Discussion: Clinical trial databases can be used to understand why trials may fail. Based on the findings, we suspect that using indices of poor interpretability may be associated with trial failure. Interpretability has not been conceived as an essential criterion for outcomes or proxy measures in trials. We will continue verifying the findings in other databases or data sources and apply this research method to improve clinical trial design. To prevent patients from experiencing trials likely to fail, we suggest further examining the interpretability of the indices in trials.