Table 2 - uploaded by Saroje Kumar Sarkar
Content may be subject to copyright.
Detection of multicollinearity based on collinearity statistics

Detection of multicollinearity based on collinearity statistics

Source publication
Article
Full-text available
Multicollinearity is a statistical phenomenon in which predictor variables in a logistic regression model are highly correlated. It is not uncommon when there are a large number of covariates in the model. Multicollinearity has been the thousand pounds monster in statistical modeling. Taming this monster has proven to be one of the great challenges...

Context in source publication

Context 1
... SPSS output in Table 2 gives the collinearity statistics. In this table we observe the high tolerances for the variables X 1 , X 2 , X 5 and X 6 but very low tolerances for the three design variables of X 3 and X 4 . ...

Similar publications

Article
Full-text available
Multiscale integration of gene transcriptomic and neuroimaging data is becoming a widely used approach for exploring the molecular underpinnings of large‐scale brain organization in health and disease. Proper statistical evaluation of determined associations between imaging‐based phenotypic and transcriptomic data is key in these explorations, in p...

Citations

... Independent variables were either binary or categorical thus no outliers were removed from the dataset. Model prediction was assessed using 5-fold cross-validation to provide the area under the receiver operating characteristic curve [36]. In multivariable models, clustered (robust) standard errors were used to account for household clustering of outcomes due to the sampling of all individuals within a household [37]. ...
Article
Full-text available
Background In the first year of roll-out, vaccination for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) prevented almost 20 million deaths from coronavirus disease 2019 (COVID-19). Yet, little is known about the factors influencing access to vaccination at the individual level within rural poor settings of low-income countries. The aim of this study was to examine determinants of vaccine receipt in rural India. Methods A census of a rural village in Tamil Nadu was undertaken from June 2021 to September 2022. We surveyed 775 participants from 262 households. Household-level data on socioeconomic status (SES), water, sanitation, and hygiene practices, and individual-level demographic information, travel history, and biomedical data, including anthropometry, vital signs, and comorbidities, were collected. Logistic regression models with 5-fold cross-validation were used to identify the biomedical, demographic, and socioeconomic determinants of vaccine receipt and the timing of receipt within the first 30 days of eligibility. Vaccine ineligible participants were excluded leaving 659 eligible participants. There were 650 eligible participants with complete biomedical, demographic, and socioeconomic data. Results There were 68.0% and 34.0% of individuals (N = 650) who had received one and two vaccine doses, respectively. Participants with household ownership of a permanent account number (PAN) or ration card were 2.15 (95% CI:1.32–3.52) or 3.02 (95% CI:1.72–5.29) times more likely to receive at least one vaccine dose compared to households with no ownership of such cards. Participants employed as housewives or self-employed non-agricultural workers were 65% (95% CI:0.19–0.67) or 59% (95% CI:0.22–0.76) less likely to receive at least one vaccine dose compared to salaried workers. Household PAN card ownership, occupation and age were linked to the timing of vaccine receipt. Participants aged ≤18 and 45–60 years were 17.74 (95% CI:5.07–62.03) and 5.51 (95% CI:2.74–11.10) times more likely to receive a vaccine within 30 days of eligibility compared to 19-44-year-olds. Biomedical factors including BMI, vital signs, comorbidities, and COVID-19 specific symptoms were not consistently associated with vaccine receipt or timing of receipt. No support was found that travel history, contact with COVID-19 cases, and hospital admissions influenced vaccine receipt or timing of receipt. Conclusion Factors linked to SES were linked to vaccine receipt, more so than biomedical factors which were targeted by vaccine policies. Future research should explore if government interventions including vaccine mandates, barriers to vaccine access, or peer influence linked to workplace or targeted vaccine promotion campaigns underpin these findings.
... The presence of multicollinearity was tested using the variance inflation factor (VIF) scores of each variable in the model. VIF scores greater than 2.5 are deemed unacceptable and should be removed from the model (Midi et al., 2010). As shown in Table 10, the multicollinearity among all the independent variables was well below the threshold value, confirming that the analysis did not suffer from a multicollinearity problem. ...
Article
Full-text available
Purpose This paper explores the effect of individual information technology culture archetypes on the perceived ease of use and perceived usefulness of e-banking customers. Design/methodology/approach A multi-stage approach was used. First, a cluster analysis was performed (based on a survey of 360 Algerian bank customers). Second, a multiple regression analysis was assessed to test the hypotheses. Findings The cluster analysis reveals five IT cultural groups for e-banking customers: dangerous, dodgers, compliant dodgers, disenchanted and addicted customers. A mapping of these archetypes is then proposed and tested. The multiple regression analysis shows that the dangerous IT culture archetype exhibit the highest level of perceived ease of use and perceived usefulness beliefs when the dodgers show the lowest one. Research limitations/implications This study is limited in that it adopts a relatively small convenience sampling in Northwest Algeria. Furthermore, enriching the model with other antecedents could be of use. However, it clarifies the issue of whether the same IT culture archetypes can be found in different contexts and show that the IT cultural archetypes list is not exhaustive. Practical implications The study contributes to the existing knowledge on e-banking adoption in developing countries and provides Algerian banks with some crucial elements. Originality/value This paper is one of the first to investigate the impact of IT culture archetypes on e-banking adoption. It (1) identified five IT culture archetypes, (2) proposed a mapping of these archetypes, (3) reinforces the use of the spinning top model and (4) goes further as it applies it in a new context (developing country) and industry (banking).
... Collinearity tolerance was used for categorical variables and VIF for continuous variables to measure how much beta coefficients were affected by the presence of other independent variables in the model. In other words, they measure the compatibility between variables (Midi et al., 2010). Based on those indicators, there were no compatibility problems. ...
Article
Full-text available
This research examines food delivery couriers’ preferred employment status and factors explaining their opinions. Previous studies have used qualitative research methods and are unable to explain couriers’ general views on employment status. In this research, a survey of 1,539 Wolt couriers was carried out in Finland with logistic regression, cross-tabulation, and content analysis as analysis methods. The results show that 56% of the couriers wanted to work as self-employed and 25% as employed. The opinion was most strongly explained by valuing work-related freedom and flexibility, which were associated with the right to refuse delivery tasks offered and to choose the amount of work, working hours and delivery vehicle. The preference for self-employment was also increased by the duration of courier work, one’s own choice to work as a courier, and age. Freedom and flexibility are dependent on the sufficient availability of delivery tasks, posing challenges when the demand is low.
... Multicollinearity among the independent variables was tested using the Variance Inflation Factor (VIF) before their inclusion in the final regression model. The VIF of 5 has been recommended as the maximum level [43][44][45]. Adjusted incidence-rate ratios (aIRR), along with their corresponding 95% confidence intervals (CIs), were used to estimate the strength and direction of the association between the determinants and CIAF/ CISAF. Variation between clusters were assessed by computing intra-class correlation coefficient (ICC), proportional change in variance (PCV) statistics and median incident rate ratio (MIRR). ...
Article
Full-text available
Undernutrition significantly contributes to failure to thrive in children under five, with those experiencing multiple forms of malnutrition facing the highest risks of morbidity and mortality. Conventional markers such as stunting, wasting, and underweight have received much attention but are insufficient to identify multiple types of malnutrition, prompting the development of the Composite Index of Anthropometric Failure (CIAF) and the Composite Index of Severe Anthropometric Failure (CISAF) as an aggregate indicators. This study aimed to identify factors associated with CIAF and CISAF among Ethiopian children aged 0–59 months using data from the 2019 Ethiopia Mini Demographic and Health Survey. The study included a weighted sample of 5,259 children and used multilevel mixed-effects negative binomial regression modeling to identify determinants of CIAF and CISAF. The result showed higher incidence-rate ratio (IRR) of CIAF in male children (adjusted IRR = 1.27; 95% CI = 1.13–1.42), children aged 12–24 months (aIRR = 2.01, 95%CI: 1.63–2.48), and 24–59 months (aIRR = 2.36, 95%CI: 1.91–2.92), those from households with multiple under-five children (aIRR = 1.16, 95%CI: 1.01–1.33), poorer households (aIRR = 1.48; 95%CI: 1.02–2.15), and those who lived in houses with an earthen floor (aIRR = 1.37, 95%CI: 1.03–1.82). Similarly, the factors positively associated with CISAF among children aged 0–59 months were male children (aIRR = 1.47, 95% CI = 1.21–1.79), age group 6–11 months (aIRR = 2.30, 95%CI: 1.40–3.78), age group 12–24 months (aIRR = 3.76, 95%CI: 2.40–5.88), age group 25–59 months (aIRR = 4.23, 95%CI: 2.79–6.39), children from households living with two and more under-five children (aIRR = 1.27, 95%CI:1.01–1.59), and children from poorer households (aIRR = 1.93, 95% CI = 1.02–3.67). Children were more likely to suffer from multiple anthropometric failures if they were: aged 6–23 months, aged 24–59 months, male sex, living in households with multiple under-five children, and living in households with poor environments. These findings underscore the need to employ a wide range of strategies to effectively intervene in multiple anthropometric failures in under-five children.
... One of the strongly correlated clusters remain between attributes of A25 to A36 and A37 to A48. Collinearity of more than 0.8 to 0.9 is alarming [34]. Thus, we keep only one of the highly correlated (0.8 to 0.9) attributes between a pair and dispose the other. ...
Preprint
Full-text available
Traffic flow forecasting is a crucial first step in intelligent and proactive traffic management. Traffic flow parameters are volatile and uncertain, making traffic flow forecasting a difficult task if the appropriate forecasting model is not used. Additionally, the non-Euclidean data structure of traffic flow parameters is challenging to analyze from both spatial and temporal perspectives. State-of-the-art deep learning approaches use pure convolution, recurrent neural networks, and hybrid methods to achieve this objective efficiently. However, many of the approaches in the literature rely on complex architectures that can be difficult to train. This complexity also adds to the black-box nature of deep learning. This study introduces a novel deep learning architecture, referred to as the multigraph convolution neural network (MGCNN), for turning movement prediction at intersections. The proposed architecture combines a multigraph structure, built to model temporal variations in traffic data, with a spectral convolution operation to support modeling the spatial variations in traffic data over the graphs. The proposed model was tested using twenty days of flow and traffic control data collected from an arterial in downtown Chattanooga, TN, with ten signalized intersections. The model's ability to perform short-term predictions over 1, 2, 3, 4, and 5 minutes into the future was evaluated against four baseline state-of-the-art models. The results showed that our proposed model is superior to the other baseline models in predicting turning movements with a mean squared error (MSE) of 0.9
... Logistic Regression (LR) is a statistical method used to analyze the relationship between a binary dependent variable and one or more independent variables. It is commonly used for binary classification problems, where the dependent variable can take only two values, i.e., zero or one [42]. ...
Article
Full-text available
Thyroid disease classification plays a crucial role in early diagnosis and effective treatment of thyroid disorders. Machine learning (ML) techniques have demonstrated remarkable potential in this domain, offering accurate and efficient diagnostic tools. Most of the real-life datasets have imbalanced characteristics that hamper the overall performance of the classifiers. Existing data balancing techniques process the whole dataset at a time that sometimes causes overfitting and underfitting. However, the complexity of some ML models, often referred to as “black boxes,” raises concerns about their interpretability and clinical applicability. This paper presents a comprehensive study focused on the analysis and interpretability of various ML models for classifying thyroid diseases. In our work, we first applied a new data-balancing mechanism using a clustering technique and then analyzed the performance of different ML algorithms. To address the interpretability challenge, we explored techniques for model explanation and feature importance analysis using eXplainable Artificial Intelligence (XAI) tools globally as well as locally. Finally, the XAI results are validated with the domain experts. Experimental results have shown that our proposed mechanism is efficient in diagnosing thyroid disease and can explain the models effectively. The findings can contribute to bridging the gap between adopting advanced ML techniques and the clinical requirements of transparency and accountability in diagnostic decision-making.
... Regression results are shown in Table 7. The tolerance for each factor is >0.1 and the VIF is <10 [37], indicating the absence of multicollinearity among the independent variables. Furthermore, residuals for both models exhibit a normal distribution. ...
Article
Full-text available
A robust level of self-awareness and self-acceptance is crucial for flight cadets. In this study, a total of 106 flight cadets from various grades and flight training sites were assessed using the self-awareness and self-acceptance scale. The scales were optimized through item analysis, reliability, and validity assessments. The finalized scales demonstrated an acceptable level of reliability and validity. Upon analyzing the collected data, it was observed that the overall self-awareness and -acceptance levels among the evaluated pilot students fell within the normal range. However, identifying positive symptoms directly proved challenging. The tested flight cadets exhibited moderate symptoms across each factor, with instances of severe symptoms in academic self-awareness. Notably, flight cadets trained abroad exhibited a lower level of self-awareness and -acceptance compared to those trained in China. But this phenomenon was not reflected in grade difference. Regression analysis revealed that physical and emotional self-awareness dimensions accounted for 62% of the variations in the psychological dimension, while passive self-acceptance explained 72% of the changes in active self-acceptance. Finally, in view of the issues found in the research, corresponding management measures and recommendations are presented to enhance the self-awareness and -acceptance levels of flight cadets.
... The study tested for multicollinearity problems among study variables using the Pearson correlation matrix approach. Conducting a multicollinearity test before the regression analysis is imperative because it results in inaccurate variances and unstable coefficients and probability values (pvalue) that cause minor changes in the data to result in substantial changes in the coefficients (Midi et al., 2010;Vatcheva & Lee, 2016). The study also applied Pesaran's CIPS panel unit root testing criteria to test whether variables are stationary. ...
Article
Full-text available
The availability of bank loans is a vital component in determining the investment and spending patterns that influence economic growth. This article examines the threshold effect of loan growth on non-performing loans (NPLs) in the Zimbabwean banking industry during dollarization. The study employed panel threshold regression models developed by Seo et al. (2019) and Kremer et al. (2013) on a panel of thirteen banks from 2009 to 2017. The study revealed that locally owned banks held a higher percentage of NPLs (12.7%) than foreign-owned banks (6.1%) during the period under study. The study also documents a loan growth threshold level of 38%. On average, the industry lends excessively, as demonstrated by the 48% loan growth rate. Primarily, local banks dominate this rate by lending above the threshold compared to foreign banks. The study observed that, below and above the threshold, loan growth exerts a negative and significant effect on NPLs. Based on the results, it can be recommended that banks should devise strategies to maintain a steady loan growth rate, enhance profitability, and effectively monitor liquidity risk exposure. The findings provide insights into reviewing bank credit policies and prudential guidelines.
... Diagnostic analysis and sensitivity analysis were performed. Multicollinearity was checked using Variance Inflation Factor (VIF) and a cut-off of 2.5 was used (Midi et al., 2010). Pregibon's dbeta values were used to identify influential values or outliers (Pregibon, 1981). ...
Article
Full-text available
Aim The Interactive Screening Program (ISP) is an anonymous screening and dialogue platform used in workplaces to encourage mental health help-seeking. This study examined utilization of ISP among law enforcement workplaces and assessed how motivational interviewing techniques were associated with various help-seeking outcomes. Method This retrospective study used secondary ISP screening and dialogue data collected from 2013 to 2019 at four law enforcement workplaces or unions (N = 691). Independent variables include counselors’ use of motivational interviewing techniques in their dialogue such as asking questions and showing empathy in their response. Help-seeking outcomes include requesting a referral, making a commitment to counseling services, decreased ambivalence about mental health services, and increased willingness to seek future services. Results Two-thirds of participants screened within the high distress level of ISP. Among them, 53% responded to the counselor’s initial email and 50% of those who responded requested a referral for future services. Binary logistic regression models showed that counselors’ use of confrontation in the dialogue was associated with improved willingness to seek services among ISP users (OR = 2.88, 95% CI = 1.24, 6.64). Further, ISP users who accessed ISP through their workplace peer support program, as compared to their employee assistance program (EAP), are more likely to show decreased ambivalence about seeking future services over time (OR = 0.28, 95% CI = 0.09, 0.80). Conclusion This study demonstrates that the anonymous ISP program can successfully engage employees with high distress levels, including employees with suicidal ideation. Results highlight the importance of customizing ISP counselors’ responses to be responsive for law enforcement employees.
... If the correlation between theoretically independent predictors is high enough, reliable model fitting and interpretation of results may be impossible. Such models behave erratically in response to small changes in the data or in the procedure used to build the predictive models [39,40]. We used random predictors generated from normal distributions and correlated with age already included in the reference model. ...
Article
Full-text available
Binary classification methods encompass various algorithms to categorize data points into two distinct classes. Binary prediction, in contrast, estimates the likelihood of a binary event occurring. We introduce a novel graphical and quantitative approach, the U-smile method, for assessing prediction improvement stratified by binary outcome class. The U-smile method utilizes a smile-like plot and novel coefficients to measure the relative and absolute change in prediction compared with the reference method. The likelihood-ratio test was used to assess the significance of the change in prediction. Logistic regression models using the Heart Disease dataset and generated random variables were employed to validate the U-smile method. The receiver operating characteristic (ROC) curve was used to compare the results of the U-smile method. The likelihood-ratio test demonstrated that the proposed coefficients consistently generated smile-shaped U-smile plots for the most informative predictors. The U-smile plot proved more effective than the ROC curve in comparing the effects of adding new predictors to the reference method. It effectively highlighted differences in model performance for both non-events and events. Visual analysis of the U-smile plots provided an immediate impression of the usefulness of different predictors at a glance. The U-smile method can guide the selection of the most valuable predictors. It can also be helpful in applications beyond prediction.