Table 2 - uploaded by Oke Gerke
Content may be subject to copyright.
Results from the linear mixed effects model (study 2)

Results from the linear mixed effects model (study 2)

Source publication
Article
Full-text available
Background: Quantitative measurement procedures need to be accurate and precise to justify their clinical use. Precision reflects deviation of groups of measurement from another, often expressed as proportions of agreement, standard errors of measurement, coefficients of variation, or the Bland-Altman plot. We suggest variance component analysis (...

Citations

... For clinical PET/CT imaging, several studies have assessed inter-and intraobserver variability and proposed methods to standardize image analysis [7][8][9][10]. Until now, there hasn't been any study conducted on preclinical PET/ CT imaging that includes a standardized image analysis. ...
Article
Full-text available
Purpose Preclinical imaging, with translational potential, lacks a standardized method for defining volumes of interest (VOIs), impacting data reproducibility. The aim of this study was to determine the interobserver variability of VOI sizes and standard uptake values (SUV mean and SUV max ) of different organs using the same [ ¹⁸ F]FDG-PET and PET/CT datasets analyzed by multiple observers. In addition, the effect of a standardized analysis approach was evaluated. Procedures In total, 12 observers (4 beginners and 8 experts) analyzed identical preclinical [ ¹⁸ F]FDG-PET-only and PET/CT datasets according to their local default image analysis protocols for multiple organs. Furthermore, a standardized protocol was defined, including detailed information on the respective VOI size and position for multiple organs, and all observers reanalyzed the PET/CT datasets following this protocol. Results Without standardization, significant differences in the SUV mean and SUV max were found among the observers. Coregistering CT images with PET images improved the comparability to a limited extent. The introduction of a standardized protocol that details the VOI size and position for multiple organs reduced interobserver variability and enhanced comparability. Conclusions The protocol offered clear guidelines and was particularly beneficial for beginners, resulting in improved comparability of SUV mean and SUV max values for various organs. The study suggested that incorporating an additional VOI template could further enhance the comparability of the findings in preclinical imaging analyses.
... We derived RCs for (a) Repeatability; the closeness of repeated measurements of the same patient made under similar conditions by the same reader (intrarater variability analysis), and (b) Reproducibility; the closeness of measurements of the same patient made by readers of varying experience in the same measurement setup (interrater variability analysis). The RCs were calculated as 2.77 times the estimated within-subject SD as derived from the mixed effect model [19]. ...
Article
Full-text available
Hip dysplasia (HD) is a frequent cause of hip pain in skeletally mature patients and may lead to osteoarthritis (OA). An accurate and early diagnosis may postpone, reduce or even prevent the onset of OA and ultimately hip arthroplasty at a young age. The overall aim of this study was to assess the reliability of an algorithm, designed to read pelvic anterior-posterior (AP) radiographs and to estimate the agreement between the algorithm and human readers for measuring (i) lateral center edge angle of Wiberg (LCEA) and (ii) Acetabular index angle (AIA). The algorithm was based on deep-learning models developed using a modified U-net architecture and ResNet 34. The newly developed algorithm was found to be highly reliable when identifying the anatomical landmarks used for measuring LCEA and AIA in pelvic radiographs, thus offering highly consistent measurement outputs. The study showed that manual identification of the same landmarks made by five specialist readers were subject to variance and the level of agreement between the algorithm and human readers was consequently poor with mean measured differences from 0.37 to 9.56° for right LCEA measurements. The algorithm displayed the highest agreement with the senior orthopedic surgeon. With further development, the algorithm may be a good alternative to humans when screening for HD.
... Firstly, descriptive statistics like mean and standard deviations have been calculated. Then, differences in measurements between the two observers (inter-observer variability), and the differences from the same observer in different time points (intraobserver variability) were compared via Bland-Altman plots [36,37] (Fig. 6). This allowed to check for any discrepancy in the measurements caused by a poor quality of the measurements or by an inadequate imaging procedure that would hinder the creation of a consistent and reliable dataset of these specific anatomical features. ...
Article
Full-text available
Background Sheep ( Ovis aries ) have been largely used as animal models in a multitude of specialties in biomedical research. The similarity to human brain anatomy in terms of brain size, skull features, and gyrification index, gives to ovine as a large animal model a better translational value than small animal models in neuroscience. Despite this evidence and the availability of advanced imaging techniques, morphometric brain studies are lacking. We herein present the morphometric ovine brain indexes and anatomical measures developed by two observers in a double-blinded study and validated via an intra- and inter-observer analysis. Results For this retrospective study, T1-weighted Magnetic Resonance Imaging (MRI) scans were performed at 1.5 T on 15 sheep, under general anaesthesia. The animals were female Ovis aries, in the age of 18-24 months. Two observers assessed the scans, twice time each. The statistical analysis of intra-observer and inter-observer agreement was obtained via the Bland-Altman plot and Spearman rank correlation test. The results are as follows (mean ± Standard deviation): Indexes: Bifrontal 0,338 ± 0,032 cm; Bicaudate 0,080 ± 0,012 cm; Evans’ 0,218 ± 0,035 cm; Ventricular 0,241 ± 0,039 cm; Huckman 1693 ± 0,174 cm; Cella Media 0,096 ± 0,037 cm; Third ventricle ratio 0,040 ± 0,007 cm. Anatomical measures: Fourth ventricle length 0,295 ± 0,073 cm; Fourth ventricle width 0,344 ± 0,074 cm; Left lateral ventricle 4175 ± 0,275 cm; Right lateral ventricle 4182 ± 0,269 cm; Frontal horn length 1795 ± 0,303 cm; Interventricular foramen left 1794 ± 0,301 cm; Interventricular foramen right 1,78 ± 0,317 cm. Conclusions The present study provides baseline values of linear indexes of the ventricles in the ovine models. The acquisition of these data contributes to filling the knowledge void on important anatomical and morphological features of the sheep brain.
... , where the VCs were estimated using a mixed-effects two-way ANOVA model with random patient effect and fixed observer effect [17]. Intra-observer reproducibility (for each observer separately) was estimated as ...
Article
Full-text available
There is a lack of an accurate standardised objective method to assess aesthetic outcome after breast surgery. In this methodological study, we investigated the intra- and inter-observer reproducibility of breast symmetry and volume assessed using three-dimensional surface imaging (3D-SI), evaluated the reproducibility depending on imaging posture, and proposed a new combined volume-shape-symmetry (VSS) parameter. Images were acquired using the VECTRA XT 3D imaging system, and analysed by two observers using VECTRA Analysis Module. Breast symmetry was measured through the root mean square distance. All women had undergone bilateral risk-reducing mastectomy and immediate breast reconstruction. The reproducibility and correlations of breast symmetry and volume measurements were compared using Bland–Altman’s plots and tested with Spearman’s rank correlation coefficient. 3D surface images of 58 women were analysed (348 symmetry measurements, 696 volume measurements). The intra-observer reproducibility of breast symmetry measurements was substantial–excellent, the inter-observer reproducibility was substantial, and the inter-posture reproducibility was substantial. For measurements of breast volumes, the intra-observer reproducibility was excellent, the inter-observer reproducibility was moderate–substantial, and the inter-posture reproducibility was substantial–excellent. The intra-observer reproducibility of VSS was excellent while the inter-observer reproducibility was substantial for both observers, independent of posture. There were no statistically strong correlations between breast symmetry and volume differences. The intra-observer reproducibility was found to be substantial–excellent for several 3D-SI measurements independent of imaging posture. However, the inter-observer reproducibility was lower than the intra-observer reproducibility, indicating that 3D-SI in its present form is not a great assessment for symmetry.
... Regarding the inter-observer reliability, all the investigated sample's results showed an excellent inter-observer reliability regarding the cast, 2D and 3D CBCT measurements with high reproducibility (Tables 8, 9& 10). The resultant very strong intra-observer and inter-observer agreement in the present study and in other studies (32,44) indicates the high precision and reproducibility of the CBCT digital model measurements as the agreement measures the closeness between readings and can be used to express precision or reproducibility of the readings (48). ...
Article
Full-text available
Aim: This study was conducted to evaluate the accuracy and reliability of CBCT obtained 3D and 2D images in dental arch space analysis. Material & Methods: A total number of 12 maxillary and mandibular plaster models (6 for maxilla and 6 for mandible) and their corresponding CBCT scans from 6 patients were used. On each plaster model, 16 different measurements were selected and measured using a digital caliper and were considered the gold standard. The same measurements were measured using the corresponding 2D & 3D CBCT scans of the participants patients using Planmeca Promax 3D MID CBCT machine. Data was then analyzed by OnDemand3D™ third party software. Results: There was an excellent to good agreement between the 2D & 3D CBCT obtained direct digital model measurements and real measurements in the whole study sample. The difference between 2D &3D CBCT measurements and real measurements was statistically and clinically NonSignificant except in the required space 3D CBCT measurement but this difference was higher in the 3D CBCT measurements. There was an overall underestimation tendency in the mean 3D CBCT linear measurements regarding most of study sample. Conclusions: CBCT allows us to determine mesiodistal sizes, intercanine distance (ICD) and intermolar distance (IMD), required and available space reliably, accurately and reproducibly 42428 compared with measurements obtained using the plaster models without clinical significant differences between CBCT measurements and real measurements.
... Finally, variance component analysis, based on mixed effects modelling, offers an opportunity to assess repeatability coefficients for the reassessment of (a) the same scan by the same rater, (b) the same scan by a different rater, and (c) a rescan by any rater. 4 We acknowledge reporting limitations of singlecenter, single PET/MRI or PET/CT system studies. 1 ,2 However, additional sources of discrepancy include type and make of scanner, tracer dose, time from tracer administration to acquisition, motion correction procedures, among others. 5 All sources must be considered to estimate the certainty of single and repeated measurement in the individual patient. ...
... The total variance explained by acquisition and reading was low for tissue and transmitral inflow Doppler measures (15% and 1%, respectively), compared to 34% and 4% for LA strain measures, respectively. Age (years) 65 (12) 58 (13) Weight (kg) 89 (18) 79 (12) Height (cm) 178 (7) 166 (7) Body mass index (kg/m 2 ) 28.0 (4.4) 28.6 (4.1) ...
Article
Full-text available
Objective: Investigate variability related to image acquisition and reading process for echocardiographic measures of left ventricular (LV) diastolic function, and its influence on classification of LV diastolic dysfunction (LVDD). Methods: Forty participants (19 women) mean age 62 (28-88) years underwent echocardiographic examinations twice by different echocardiographers and blinded analyses by four readers in a cross-sectional design. Measurements included quantification of two- (2D) and three-dimensional (3D) recordings of the left atrium (LA) (maximal) volume (LAVmax ) and spectral Doppler blood flow and tissue velocities for assessment of LV diastolic function. Variability and reproducibility measures were calculated using variance component analyses and Kappa statistics. Results: Image acquisition influenced variability more than image reading (mean 24% and 4% of variance, respectively), but variability from image reading was especially important for 2D LAVmax (16% of variance) compared to 4% for 3D LAVmax , which was reflected in better agreement for 3D measures. The variability of measures used in classification of LVDD had clinical significance, and agreement across the four raters in classification using current recommendations was only fair (Kappa 0.42), but the agreement improved when using 3D LAVmax (Kappa 0.58). Agreement and reliability measures were reported for all measures. Conclusion: Performing a new image acquisition influenced variability more than a introducing a new image reader, but there were differences across the different measures. LAVmax by 3D is superior to 2D with respect to lower variability. The variability of diastolic measures influences the reliability of LVDD classification, and this should be taken into account in the everyday clinic.
... Firstly, descriptive statistics like mean and standard deviations have been calculated. Then, differences in measurements between the two observers (inter-observer variability), and the differences from the same observer in different time points (intra-observer variability) were compared via Bland-Altman plots (7,8) ( Fig. 6). This allowed to check for any discrepancy in the measurements caused by a poor quality of the measurements or by an inadequate imaging procedure that would hinder the creation of a consistent and reliable dataset of these speci c anatomical features. ...
Preprint
Full-text available
Background Sheep (Ovis aries) have been largely used as animal models in a multitude of specialties in biomedical research. The similarity to human brain anatomy in terms of brain size, skull features, and gyrification index, gives to ovine as a large animal model a better translational value than small animal models in neuroscience. Despite this evidence and the availability of advanced imaging techniques, morphometric brain studies are lacking. We herein present the morphometric ovine brain indexes and anatomical measures developed by two observers in a double-blinded study and validated via an intra- and inter-observer analysis. Results For this retrospective study, T1-weighted Magnetic Resonance Imaging (MRI) scans were performed at 1.5T on 15 sheep, under general anaesthesia. The animals were female Ovis aries, in the age of 18-24 months. Two observers assessed the scans, twice time each. The statistical analysis of intra-observer and inter-observer agreement was obtained via the Bland-Altman plot and Spearman rank correlation test. The results are as follows (mean±Standard deviation): Indexes: Bifrontal 0,338 ± 0,032 cm; Bicaudate 0,080 ± 0,012 cm; Evans’ 0,218 ± 0,035 cm; Ventricular 0,241 ± 0,039 cm; Huckman 1,693 ± 0,174 cm; Cella Media 0,096 ± 0,037 cm; Third ventricle ratio 0,040 ± 0,007 cm. Anatomical measures: Fourth ventricle length 0,295 ± 0,073 cm; Fourth ventricle width 0,344 ± 0,074 cm; Left lateral ventricle 4,175 ± 0,275 cm; Right lateral ventricle 4,182 ± 0,269 cm; Frontal horn length 1,795 ± 0,303 cm; Interventricular foramen left 1,794 ± 0,301cm; Interventricular foramen right 1,78 ± 0,317 cm. Conclusions The present study provides baseline values of linear indexes of the ventricles in the ovine models. The acquisition of these data contributes to filling the knowledge void on important anatomical and morphological features of the sheep brain.
... Mean and standard deviations of these intrarater differences were 0.59 and 1.64, respectively. Repeatability coefficients were derived by means of a linear mixed effects model, with repetition as fixed and patient and rater as random factors [36]. The repeatability coefficient for a new rating of the same scan by the same rater equaled 2.77 times ...
Article
Full-text available
The Bland–Altman Limits of Agreement is a popular and widespread means of analyzing the agreement of two methods, instruments, or raters in quantitative outcomes. An agreement analysis could be reported as a stand-alone research article but it is more often conducted as a minor quality assurance project in a subgroup of patients, as a part of a larger diagnostic accuracy study, clinical trial, or epidemiological survey. Consequently, such an analysis is often limited to brief descriptions in the main report. Therefore, in several medical fields, it has been recommended to report specific items related to the Bland–Altman analysis. The present study aimed to identify the most comprehensive and appropriate list of items for such an analysis. Seven proposals were identified from a MEDLINE/PubMed search, three of which were derived by reviewing anesthesia journals. Broad consensus was seen for the a priori establishment of acceptability benchmarks, estimation of repeatability of measurements, description of the data structure, visual assessment of the normality and homogeneity assumption, and plotting and numerically reporting both bias and the Bland–Altman Limits of Agreement, including respective 95% confidence intervals. Abu-Arafeh et al. provided the most comprehensive and prudent list, identifying 13 key items for reporting (Br. J. Anaesth. 2016, 117, 569–575). An exemplification with interrater data from a local study accentuated the straightforwardness of transparent reporting of the Bland–Altman analysis. The 13 key items should be applied by researchers, journal editors, and reviewers in the future, to increase the quality of reporting Bland–Altman agreement analyses.
... Furthermore, it can be appropriate to include two or more readers, which leads to a two-factorial design and entails observer agreement assessment. 18 The choice of the accuracy measure depends on whether it is an early or a confirmatory diagnostic accuracy study. Early diagnostic accuracy trials may focus on overall estimates of diagnostic accuracy of tests on a continuous or ordinal scale, without defining a positivity threshold and considering sensitivity (true positive rate) and specificity (true negative rate) jointly. ...
Article
Full-text available
The aim of diagnostic accuracy studies is to evaluate how accurately a diagnostic test can distinguish diseased from nondiseased individuals. Depending on the research question, different study designs and accuracy measures are appropriate. As the prior knowledge in the planning phase is often very limited, modifications of design aspects such as the sample size during the ongoing trial could increase the efficiency of diagnostic trials. In intervention studies, group sequential and adaptive designs are well established. Such designs are characterized by preplanned interim analyses, giving the opportunity to stop early for efficacy or futility or to modify elements of the study design. In contrast, in diagnostic accuracy studies, such flexible designs are less common, even if they are as important as for intervention studies. However, diagnostic accuracy studies have specific features, which may require adaptations of the statistical methods or may lead to specific advantages or limitations of sequential and adaptive designs. In this article, we summarize the current status of methodological research and applications of flexible designs in diagnostic accuracy research. Furthermore, we indicate and advocate future development of adaptive design methodology and their use in diagnostic accuracy trials from an interdisciplinary viewpoint. The term “interdisciplinary viewpoint” describes the collaboration of experts of the academic and nonacademic research.