Figure 4 - uploaded by Davide Giavarina
Content may be subject to copyright.
Distribution plot of differences between measurement by methods A and B. The dotted line represents Normal distribution. Shapiro-Wilk test for normal distribution accepted normality (P = 0.814). 

Distribution plot of differences between measurement by methods A and B. The dotted line represents Normal distribution. Shapiro-Wilk test for normal distribution accepted normality (P = 0.814). 

Source publication
Article
Full-text available
In a contemporary clinical laboratory it is very common to have to assess the agreement between two quantitative methods of measurement. The correct statistical approach to assess this degree of agreement is not obvious. Correlation and regression studies are frequently proposed. However, correlation studies the relationship between one variable an...

Context in source publication

Context 1
... bias of -27.2 units is represented by the gap between the X axis, corresponding to zero dif- ferences, and the parallel line to the X axis at -27.2 units. This negative bias seems to be due to measurements over 200 units, while for lower concentrations data are closer to each other. A negative trend seems to be evident along the graph, as better shown in Figure 3. Drawing a re- gression line of the differences could help in de- tecting a proportional difference (10-12). The vi- sual examination of the plot allows us to evalu- ate the global agreement between the two measurements. In our example, we can summa- rize the lack of agreement by calculating the bias, estimated by the mean difference (d) and the standard deviation of the differences (s). We would expect most of the differences to lie be- tween d -2s and d +2s, or more precisely, 95% of differences will be between d-1.96s and d +1.96s, if the differences are normally distributed (Gaussian). Normal distribution of the differenc- es must always be verified, for example by draw- ing a histogram. If this is skewed or has very long tails the assumption of normality may not be valid. From the example of table 1, the measure- ments of the two methods are not distributed normally, but on the other hand the differences do seem to be ( Figure 4). Statistical tests should always be used to determine if the distribution is normal, since in some cases normality cannot be determined simply by observing the histogram plot. If any statistical software is available, a test for normal distribution (such as Shapiro-Wilk test (13), D'Agostino-Pearson test (14), Kolmogorov- Smirnov test (15)) can be done, for the hypothe- sis that the distribution of the observations in the sample is normal (if P < 0.05 then reject nor- mality). If differences are not normally distribut- ed, a logarithmic transformation of original data can be ...

Similar publications

Research
Full-text available
This project explores the work of a group of researchers from Cornell University published on the paper Winning Arguments: Interaction Dynamics and Persuasion Strategies in Good-Faith Online Discussions [1]. The authors researched interaction dynamics leading to persuasion, as well as language conducive to persuasion. On the present project we rest...
Article
In theory parameters of dynamic N-mixture models can be estimated with multiple years of data without the robust design under the assumption of constant detection probability. However, such an assumption can rarely be met in long-term studies, and the consequences of violating this assumption in the inferences of dynamic N-mixture models have not b...
Conference Paper
Full-text available
Random-walk and diffusion models for two-choice comparison of paired successive or simultaneous stimuli focus on response time (RT), modeled as the time needed to reach one or the other barrier, and its relation to the response probabilities. Logit P1 = ln[P1/(1-P1)], where P1 is the probability of responding ”first greater,” can be seen as a measu...
Article
Full-text available
Flexibility in study designs, definitions, outcomes and analytic models increases the chance that the results reported in an empirical study are untrue. Such flexibility in methodological and analytic practices has been observed in evaluations of a number of drug prevention programs that appear on lists of evidenced-based programs. The current pape...
Article
Full-text available
In this paper, we analyse the technical biases of two intensified video cameras, ICC7 and ICC9, of the double-station meteor camera system CILBO (Canary Island Long-Baseline Observatory). This is done to thoroughly understand the effects of the camera systems on the scientific data analysis. We expect a number of errors or biases that come from the...

Citations

... To validate the measurements of FLAP and FMAP between the radiologic (CT) and the intraoperative bone surface, the Bland-Altman plot and its 95% limits of agreement (95% LOA) were evaluated [30,31]. The intra-and inter-observer reliability of the two orthopedic surgeons was retested 2 weeks after the first assessment, and the average values were used. ...
Article
Full-text available
The rotational alignment of the femoral component in total knee arthroplasty (TKA) is considered an important factor, but it is still difficult to assess intraoperatively. This study was conducted to identify anatomical parameters for femoral rotational alignment. A total of 204 patients who underwent primary TKA between 2015 and 2019 were enrolled. The femoral lateral (FLAP) and femoral medial anteroposterior (FMAP) lengths were measured as the widest lengths in the anteroposterior (AP) axis after distal femoral resection. The difference between FLAP and FMAP was defined as dFAP. The concordance correlation coefficient (CCC) was assessed for agreement between the cTEA-PCA and the value of femoral rotation using the linear regression analysis equation. HKA, FLAP, FMAP, and dFAP were significantly associated with femoral rotational alignment. The prediction equation combining the novel intraoperative anatomical references showed improved association with rotational alignment. If dFAP was 6.0 mm, the femoral rotation angle was calculated as 4.9° using this univariate regression equation. The CCC was 0.483, indicating moderate agreement. The dFAP showed an association with distal femoral rotational alignment. A 6 mm dFAP could be a reference for around 5° of femoral rotation. The equation developed in this study may be a reliable tool for intraoperative distal femoral rotational alignment.
... p < 0.001), showing an excellent agreement. Bland-Altman plots (Giavarina, 2015) in (Figure 6c) showed satisfying consistency between the two methods with a bias of −0.0052 (95% confidence interval: −1.577 to 1.567), which tended to make the differences between the two measurements easier to notice. Using a difference plot, a mild trend of differences was proportional to the measurement's magnitude. ...
Article
Full-text available
Extracellular vesicles (EVs) carry disease‐specific molecular profiles, demonstrating massive potential in biomarker discovery. In this study, we developed an integrated biochip platform, termed EVID‐biochip (EVs identification and detection biochip), which integrates in situ electrochemical protein detection with on‐chip antifouling‐immunomagnetic beads modified with CD81 antibodies and zwitterion molecules, enabling efficient isolation and detection of neuronal EVs. The capability of the EVID‐biochip to isolate common EVs and detect neuronal EVs associated with Parkinson's disease in human serum is successfully demonstrated, using the transmembrane protein L1‐cell adhesion molecule (L1CAM) as a target biomarker. The EVID‐biochip exhibited high efficiency and specificity for the detection of L1CAM with a sensitivity of 1 pg/mL. Based on the validation of 76 human serum samples, for the first time, this study discovered that the level of L1CAM/neuronal EV particles in serum could serve as a reliable indicator to distinguish Parkinson's disease from control groups with AUC = 0.973. EVID‐biochip represents a reliable and rapid liquid biopsy platform for the analysis of complex biofluids offering EVs isolation and detection in a single chip, requiring a small sample volume (300 µL) and an assay time of 1.5 h. This approach has the potential to advance the diagnosis and biomarker discovery of various neurological disorders and other diseases.
... To assess the reliability of the GAITWell ® system across two visits, the Intraclass Correlation Coefficient (ICC)2,1 was employed with a two-way random effects model. Interpretation of the ICC was as follows: poor (< 0.50), moderate (0.50 to 0.75), good (0.76 to 0.90), and excellent (> 0.90) [21]. The standard error of the mean (SEM) provides information about the repeatability of the measurement and was calculated using the pooled standard deviation between both visits and the ICC2,1: ...
Preprint
Full-text available
Background: Gait analysis systems offer invaluable insights for rehabilitation, yet their expense limits clinician access. We developed GAITWell®, a low-cost modular system for capturing spatiotemporal gait variables, and evaluated its measurement properties. Methods: The GAITWell® uses discreet binary sensors on interconnected boards to collect gait data, which is then analyzed using the DBSCAN algorithm on the Cartesian points from sensor outputs. Reliability was assessed using ICC2,1, standard error of the mean (SEM) and Bland-Altman plots. Validity was determined by comparing GAITWell® with the Qualisys Pro-Reflex system. Results: Participated 38 healthy adults, with an average age of 33.2 years (SD 13.0). Correlations between GAITWell® and the Qualisys system ranged from moderate to very high for most gait variables, with the exception of stride length, which demonstrated a low but significant correlation (r = .360, p < .05). The ICC2,1 indicated moderate to good agreement for most gait variables. Stance and double support times, cadence, and base of support exhibited poor reliability, characterized by larger SEM and limits of agreement. Conclusions: Our findings indicate that the GAITWell® system is a promising tool for gait analysis. Future research will enhance sensor accuracy and refine algorithms to improve reliability.
... To assess the congruence between the DBC algorithm and the reference method (i-STAT), we used the Bland and Altman (B&A) plot, a well-established tool for examining agreement between two measurement techniques [21]. This analytical approach allows to calculate both the absolute and relative differences, as well as determine the limit of agreement between the two methods [21][22][23]. The Bland and Altman method suggests that approximately 95% of the data points should fall within the range of ±2 standard deviations (2s) from the mean difference. ...
... The determination of acceptable limits should be based on a priori definition, rooted in clinical necessity, biological considerations, or other pertinent objectives. In essence, the appropriateness of the observed limits of agreement should be predicated on the specific context and goals of the investigation [22]. In this study, the B&A limit of agreement were compared with the acceptable clinical limits for parameters suggested by Ricos et al. and Westgard QC [24]. ...
... Correlation studies evaluate relationships between variables, not differences, thus aren't recommended for method comparability assessment. Bland and Altman introduced an approach assessing agreement by studying mean differences and constructing limits of agreement [21,22]. In this study, Bland-Altman plots revealed slightly higher values for Hb, K, and Na with the DBC algorithm compared to i-STAT, indicating systematic differences. ...
Article
Full-text available
The purpose of this work was to investigate the degree of agreement between two distinct approaches for measuring a set of blood values and to compare comfort levels reported by participants when utilizing these two disparate measurement methods. Radial arterial blood was collected for the comparator analysis using the Abbott i-STAT® POCT device. In contrast, the non-invasive proprietary DBC methodology is used to calculate sodium, potassium, chloride, ionized calcium, total carbon dioxide, pH, bicarbonate, and oxygen saturation using four input parameters (temperature, hemoglobin, pO2, and pCO2). Agreement between the measurement for a set of blood values obtained using i-STAT and DBC methodology was compared using intraclass correlation coefficients, Passing and Bablok regression analyses, and Bland Altman plots. A p-value of <0.05 was considered statistically significant. A total of 37 participants were included in this study. The mean age of the participants was 42.4 ± 13 years, most were male (65%), predominantly Caucasian/White (75%), and of Hispanic ethnicity (40%). The Intraclass Correlation Coefficients (ICC) analyses indicated agreement levels ranging from poor to moderate between i-STAT and the DBC’s algorithm for Hb, pCO2, HCO3, TCO2, and Na, and weak agreement for pO2, HSO2, pH, K, Ca, and Cl. The Passing and Bablok regression analyses demonstrated that values for Hb, pO2, pCO2, TCO2, Cl, and Na obtained from the i-STAT did not differ significantly from that of the DBC’s algorithm suggesting good agreement. The values for Hb, K, and Na measured by the DBC algorithm were slightly higher than those obtained by the i-STAT, indicating some systematic differences between these two methods on Bland Altman Plots. The non-invasive DBC methodology was found to be reliable and robust for most of the measured blood values compared to invasive POCT i-STAT device in healthy participants. These findings need further validation in larger samples and among individuals afflicted with various medical conditions.
... OUEP is an objective measure because it is an averaged time interval; thus, OUEP seems reliable and should be easier to compare between studies [17,19]. An interesting nding from our study is presented on the Figure 3 and Figure 4 where bias grow simultaneously with increasing OUEP [34]. This indicate that agreement between measured and estimated OUEP could not be constant but varies with tness level. ...
... Therefore, it is justi ed to derive the models tailored for particular populations (i.e. trained and untrained) [8,34]. ...
Preprint
Full-text available
Background: Endurance athletes (EA) are an emerging population of focus for cardiovascular health. The oxygen uptake efficiency plateau (OUEP) is the levelling-off period of ratio between oxygen uptake (VO2) and ventilation (VE). In the cohort of EA, we externally validated prediction models for OUEP and derived with internal validation a new equation. Methods: 140 EA underwent a medical assessment and maximal cycling cardiopulmonary exercise test. Participants were 55% male (N=77, age=21.4±4.8 years, BMI=22.6±1.7 kg·m−2, peak VO2=4.40±0.64 L·min−1) and 45% female (N=63, age=23.4±4.3 years, BMI=22.1±1.6 kg·m−2, peak VO2=3.21±0.48 L·min−1). OUEP was defined as the highest 90-second continuous value of the ratio between VO2 and VE. We used the multivariable stepwise linear regression to develop a new prediction equation for OUEP. Results: OUEP was 44.2±4.2 mL·L−1 and 41.0±4.8 mL·L−1 for males and females, respectively. In external validation, OUEP was comparable to directly measured and did not differ significantly. The prediction error for males was –0.42 mL·L−1 (0.94%, p=0.39), and for females was +0.33 mL·L−1 (0.81%, p=0.59). The developed new prediction equation was: 61.37–0.12·height (in cm) + 5.08 (for males). The developed model outperformed the previous and explained 12.9% of the variance (R= 0.377, R2= 0.129, RMSE= 4.39 mL·L−1). Conclusion: OUEP is a stable and transferable cardiorespiratory index. OUEP is minimally affected by fitness level. The predicted OUEP provided promising but limited accuracy among EA. The derived new model is tailored for EA. OUEP could be used to stratify the cardiorespiratory response to exercise and guide training.
... A potential reason for the discrepancies observed in the present pilot study may be the limited sample size (n = 11) of paired samples [34]. ...
Article
Full-text available
Relative energy deficiency in sport (RED-S) is a condition that arises from persistent low energy availability (LEA), which affects the hypothalamic–pituitary axis and results in alterations of several hormones in both male and female athletes. As frequent blood hormone status determinations using venipuncture are rare in sports practice, microsampling offers promising possibilities for preventing and assessing RED-S. Therefore, this study aimed at developing a liquid chromatography–high-resolution tandem mass spectrometry (LC-HRMS/MS) method for quantifying relevant steroids and thyroid hormones in 30 μL of capillary blood obtained using Mitra® devices with volumetric absorptive microsampling technology (VAMS®). The results of the study showed that all validation criteria were met, including a storage stability of more than 28 days in a frozen state (−18 °C) and 14 days at room temperature (20 °C). The validated assay provided precise (<12%) and accurate (<13%) results for all the target analytes. Furthermore, as a proof of concept, autonomously collected VAMS® samples from 50 female and male, healthy, active adults were analyzed. The sensitivity of all analytes was adequate to quantify the decreased hormone concentrations in the RED-S state, as all authentic samples could be measured accordingly. These findings suggest that self-collected VAMS® samples offer a practical opportunity for regular hormone measurements in athletes and can be used for early RED-S assessment and progress monitoring during RED-S recovery.
... Three lines on the plot indicate the mean of differences and the limits of agreement (calculated as the mean difference ± 2 × SD). Good agreement is indicated when the mean difference is close to zero and falls within the 95% limit of agreement [38]. Differences that were within the 95% limit of agreement showed that the two HRQoL measures could be used interchangeably. ...
... In the context of dystonia, research has generally shown that while pain, mobility, and social participation are significantly affected, patients may experience less difficulty in self-care. This is due to the nature of dystonia, which, depending on its type and severity, may not always severely limit an individual's ability to perform personal care tasks [38]. ...
Article
Full-text available
Background/Objectives: Dystonia is a neurological movement disorder characterized by involuntary muscle contractions that lead to abnormal movements and postures; it has a major impact on patients' health-related quality of life (HRQoL). The aim of this study was to examine the HRQoL of Romanian patients with dystonia using the EQ-5D-5L instrument. Methods: Responses to the EQ-5D-5L and the visual analogue scale (VAS) were collected alongside demographic and clinical characteristics. Health profiles were analyzed via the metrics of the EQ-5D-5L, severity levels, and age groups. Using Shannon's indexes, we calculated informativity both for patients' health profile as a whole and each individual dimension. Level sum scores (LSS) of the EQ-5D-5L were calculated and compared with scores from the EQ-5D-5L index and VAS. The HRQoL measures were analyzed through demographic and clinical characteristics. Descriptive statistics, Spearman correlation, and non-parametric tests (Mann-Whitney U or Kruskall-Wallis H) were used. The level of agreement between HRQoL measures was assessed using their intraclass correlation coefficient (ICC) and Bland-Altman plots. Results: A sample of 90 patients was used, around 75.6% of whom were female patients, and the mean age at the beginning of the survey was 58.7 years. The proportion of patients reporting "no problems" in all five dimensions was 10%. The highest frequency reported was "no problems" in self-care (66%), followed by "no problems" in mobility (41%). Shannon index and Shannon evenness index values showed higher informativity for pain/discomfort (2.07 and 0.89, respectively) and minimal informativity for self-care (1.59 and 0.68, respectively). The mean EQ-5D-5L index, LSS, and VAS scores were 0.74 (SD = 0.26), 0.70 (SD = 0.24), and 0.61 (SD = 0.21), respectively. The Spearman correlations between HRQoL measures were higher than 0.60. The agreement between the EQ-5D-5L index and LSS values was excellent (ICC = 0.970, 95% CI = 0.934-0.984); the agreement was poor-to-good between the EQ-5D-5L index and VAS scores (ICC = 683, 95% CI = 0.388-0.820), and moderate-to-good between the LSS and VAS scores (ICC = 0.789, 95% CI = 0.593-0.862). Conclusions: Our results support the utilization of the EQ-5D-5L instrument in assessing the HRQoL of dystonia patients, and empirical results suggest that the EQ-5D-5L index and LSS measure may be used interchangeably. The findings from this study highlight that HRQoL is complex in patients with dystonia, particularly across different age groups.
... and the precision of the measurements on the FDMprinted models were similar to those on plaster models8 Zhang et al.19 (2019) Nine markers were selected for superimposition with uniformity in both the arches DLP printers were found to be faster and more accurate than SLA printer EvoDent 50 μm (DLP) had the highest printing accuracy, while Form 2 (SLA) 100 μm showed the lowest were scanned digitally and the files obtained were superimposed on the source files for measurement of trueness. Precision was measured by superimposing the STL files of PolyJet and SLA replicas on themselves (multiple replicas were made) ...
Article
Full-text available
Objective To assess and compare the accuracy of 3D-printed digital models, orthognathic surgical splints, retainers or aligners, and implant surgical guides printed using various 3D printing technologies. Methods The search comprised prospective and retrospective studies related to the accuracy of 3D-printed models, splints, implant surgical guides, and retainers/aligners in patients undergoing orthodontic and orthognathic surgical procedures. The outcomes were assessed in terms of linear measurements, degree of fit, and positional deviations in comparison to conventional plaster models and surgical acrylic splints. The methodologic quality of the articles and the level of evidence were assessed using the QUADAS-2 tool. A meta-analysis was performed to compare the accuracy of models printed by different technologies to plaster models using the random-effects model. Results Twenty-one retrospective studies were included. Quality assessment of all included articles showed moderate risk of bias and no article was excluded. The systematic review showed that there were no significant differences between printed surgical splints, retainers/aligners compared, and the control group. Meta-analysis of six eligible studies showed that printed models had a general trend toward overestimation of linear measurements (mean difference = 0.22 mm; 95% confidence interval = – 0.09 to 0.36 mm, I ² = 85%). Measurements made on digital light processing-printed models were significantly different than those made on plaster models. Conclusion The mean difference of 0.22 mm in linear measurements made on 3D-printed models and plaster models was clinically acceptable. Among the various printing technologies, PolyJet and fused deposition modeling-printed models showed higher accuracy. The printed splints and retainers/aligners were as accurate and reliable as their respective gold standards. The findings should be interpreted with caution due to high level of heterogeneity. PROSPERO Registration Number: CRD42020175511
... Bland-Altman plots showed that the systematic bias tended to increase when tummy time exceeded 15 min/day. Because of limited sample sizes, systematic biases per birth status were not calculated (Giavarina, 2015). However, examining the mean absolute differences we found that parents of preterm infants overestimated tummy time by approximately 22 min/day (44.63% accuracy). ...
Article
Importance: Parent recall is the primary method for measuring positioning practices such as tummy time in infants. Concerns regarding the accuracy of parent recall have been raised in the literature. To date, no study has examined the agreement of tummy time recall measures with gold-standard methods. Objective: To assess the agreement between parental recall versus direct observation of tummy time in infants, and to explore the impact of prematurity on this relationship. Design: Cross-sectional observational study, spanning 1 yr. Setting: Participants’ homes Participants: Thirty-two infant–parent dyads (19 full-term, 13 preterm), with infants ages 3 to 6 mo and caregivers ages older than 18 yr. Outcome and Measures: Home-recorded videos of infant play across 3 days were used as a proxy for direct observation of tummy time and compared with a 12-item parent recall survey. Results: Parent recall had a significant moderate correlation (ρ = .54, p = .002) with direct observation in full-term infants but was not correlated (p = .23) with direct observation in preterm infants. On average, parents of preterm infants overestimated tummy time by 2.5 times per day compared with direct observation. Conclusions and Relevance: For full-term infants, parent recall measures of tummy time exhibit an acceptable level of agreement with direct observation and can be reliably used over shorter periods. Parents of preterm infants may display a bias in recalling tummy time, leading to overestimations. To accurately assess tummy time in this population, a combination of subjective and objective measures should be explored. Plain-Language Summary: Tummy time is an essential movement experience for infants, especially for preterm infants, who are at a higher risk for motor delays. The most common way to track tummy time is through parent reports, or recall, versus a practitioner directly observing tummy time in the home. Despite the widespread use of parent recall to track tummy time, no study has examined the accuracy of parent recall versus direct observation in the home. Accurately assessing tummy time is crucial for improving and supporting health outcomes for infants. This study found that prematurity may affect the accuracy of parent recall for assessing tummy time in young infants. The authors discuss the implications of this finding and provide suggestions to guide the selection of appropriate methods to measure tummy time in clinical practice and research studies.
... 41 However, while the correlation coefficient can determine the relationship between two variables, it cannot assess their differences. 42 Therefore, a Bland-Altman plot, a data plotting method used to evaluate the agreement between two clinical measurements, 33,43 was utilised to determine the variance in variables between the two lunging methods. Despite some variation in the mean differences of variables measured from the two methods, the total variance was close to zero, within the mean difference ± 1.96 SD. ...
Preprint
Background: Lunging is a training method that is performed in a round pen or on a lunge line. However, there is no consensus on applying lunging techniques for physical fitness training. Objectives: To investigate the effort intensity, autonomic responses, and method agreement in applying different lunging protocols to untrained horses. Study design: A non-randomised control trial. Methods: Sixteen untrained horses (aged 13.6 ± 6.3 years and weighing 358 ± 47.4 kg) were studied. Each horse was lunged with a similar programme on a lunge line and, subsequently, in a round pen at a two-day interval. The heart rate variability (HRV) and effort intensity, indicated as a percentage of maximum heart rate (%HRmax), were determined pre-lunging, during lunging at distinct gaits, and at 30-minute intervals for 120 minutes post-lunging. The correlation and method agreement between the two lunging methods were analysed with Pearson’s correlation coefficient and Bland–Altman plots, respectively. Results: The horses ran faster and covered longer distances during exercise on a lunge line than in a round pen. The effort intensity during cantering reached moderate levels (75.1 ± 2.4% HRmax) with occasional high-intensity levels (88.1 ± 1.3% HRmax) via both lunging methods. The HRV reached a minimum during cantering and returned to the baseline 120 minutes post-lunging. The HRV parameters (SDNN, RMSSD, LF, HF, SD1, and SD2) were strongly correlated ( r ≥ 0.97 and p < 0.001 for all) with a large correlation effect size (R > 0.85) and excellent agreement (average differences were within mean ± 1.96 SD) between the two lunging methods. Main limitations: The running speed and distance reported during lunging may not be entirely accurate due to the manual calculation required. Conclusions: Lunging can provoke optimal physiological responses in horses. The two tested lunging methods may be applied interchangeably for physical fitness training.