ArticlePDF Available

Applied Logistic Regression

March 2012
Technometrics 34(3):358-359

March 2012
34(3):358-359

DOI:10.1080/00401706.1992.10485291

Authors:

University of Pennsylvania

Introduction to the Logistic Regression Model Multiple Logistic Regression Interpretation of the Fitted Logistic Regression Model Model-Building Strategies and Methods for Logistic Regression Assessing the Fit of the Model Application of Logistic Regression with Different Sampling Models Logistic Regression for Matched Case-Control Studies Special Topics References Index.

Content uploaded by Andrew Cucchiara

Content may be subject to copyright.

NODDI-derived measures of microstructural integrity in medial temporal lobe white matter pathways are associated with Alzheimer's disease pathology and cognitive outcomes

Preprint

Full-text available

Oct 2023

INTRODUCTION: Diffusion tensor imaging has been used to assess white matter (WM) changes in the early stages of Alzheimer's disease (AD). However, the tensor model is necessarily limited by its assumptions. Neurite Orientation Dispersion and Density Imaging (NODDI) can offer insights into microstructural features of WM change. We assessed whether NODDI more sensitively detects AD-related changes in medial temporal lobe WM than traditional tensor metrics. METHODS: Standard diffusion and NODDI metrics were calculated for medial temporal WM tracts from 199 older adults drawn from ADNI3 who also received PET to measure pathology and neuropsychological testing. RESULTS: NODDI measures in medial temporal tracts were more strongly correlated to cognitive performance and pathology than standard measures. The combination of NODDI and standard metrics exhibited the strongest prediction of cognitive performance in random forest analyses. CONCLUSIONS: NODDI metrics offer additional insights into contributions of WM degeneration to cognitive outcomes in the aging brain.

Machine Learning based Wine Quality Evaluation

Article

Full-text available

Oct 2023

Xiaote Liu

In the wine industry, red wine is a kind of fruit wine that made from grapes, which is quite common in our life. With the gradual expansion of people who like red wine, the wine industry is getting more and more attention at the same time, the quality of red wine is also getting more and more attention. In order to better evaluate the quality of red wine, an evaluation model of red wine quality is established based on the collected 200 red wine samples and the corresponding 11 index data. Firstly, the dimensionality of the data index is decreased by using principal component analysis and factor analysis. Then, k-means clustering, sum of squares of deviation and class average algorithm are used to perform cluster analysis on the data processed by principal component analysis or factor analysis. Then, logistic regression analysis is used to test the accuracy of the data classification. Finally, Fisher discriminant method is used to perform discriminant analysis on the data and establish a model. The score function is obtained, and the score function is used to calculate the quality score of each group of wine, and the corresponding suggestions were given according to the quality evaluation results.

Decision tree based classification of glass artefact types

Article

Full-text available

Apr 2023

Hai Yu

This paper uses logistic regression, SVM, and decision trees in machine learning to analyse 67 data items from question C of the 2022 GCSE Cup National Student Mathematical Modelling Competition. The data were analysed using systematic clustering, the clustering coefficients were analysed, the number of clusters K was further determined by the "elbow" method, a clear classification pattern between the glasses was obtained, and a grid search method was used to classify the glasses. The results show that the new excavated glass artefacts have been classified using the "elbow" method. The results show that newly excavated glass artefacts are classified by varying PbO content. If the lead oxide (PbO) content is below 5.46%, they are considered high-potassium glasses and vice versa for lead-barium drinks. The content of silica was further used as a boundary to divide the high-potassium glass into two subclasses. Lead-barium glasses are divided into three subclasses using the lead oxide and silica content as the boundary. Using a series of models and algorithms, the classification patterns of different types of glass and their subclasses were clarified, and the results were tested for reasonableness and sensitivity. Such a model can be used to classify newly excavated glass artefacts and can also be modified based on this model for the identification and analysis of other ancient artefacts.

Einsatzstellenbewertungen der deutschen Feuerwehren – Brandversuche in situ

Article

Sep 2022

Die deutschen Feuerwehren erfassen seit dem Jahr 2016 systematisch signifikante Brände in Gebäuden. Teil dieser Erfassung ist die Überprüfung, ob Schutzziele verletzt werden. Die Branddirektion München wertet diese Daten für den Fachausschuss Vorbeugender Brand‐ und Gefahrenschutz der deutschen Feuerwehren (FA VB/G) aus und kooperiert dabei mit der Technischen Universität München. Dieser Artikel stellt die Methode und die ersten Erkenntnisse aus dieser Statistik dar. Erste Trends zeigen eine erheblich hohe Anzahl von Rauchausbreitungen, die direkte Wirkung des abwehrenden Brandschutzes und von organisatorischen Brandschutzmaßnahmen. On‐site fire inspection by German Fire Brigades – fire tests in situ The German fire brigades have been systematically recording significant fires in buildings since 2016. Part of this recording is checking whether objectives are being violated. Together with the Technical University of Munich, the Munich Fire Brigade on behalf of the expert group for fire prevention of the German fire brigades evaluates this data on behalf of all Fire Brigades involved in Germany. This article presents the method and the first findings from these statistics. First trends show a considerably high number of smoke spreads, the direct effect of fire‐fighting measures and of organizational fire protection measures.

Explainable AI - Requirements, Use Cases and Solutions

Book

Full-text available

Apr 2022

For Germany alone, it is expected that services and products based on the use of artificial intelligence (AI) will generate revenues of 488 billion euros in 2025 - this would represent 13 percent of Germany’s gross domestic product. In important application sectors, the explainability of decisions made by AI is a prerequisite for acceptance by users, for approval and certification procedures, or for compliance with the transparency obligations required by the GDPR. The explainability of AI products is therefore one of the most important market success factors, at least in the European context. The core of AI-based applications - by which we essentially mean machine learning applications here - is always the underlying AI models. These can be divided into two classes: White-box and black-box models. White-box models, such as decision trees based on comprehensible input variables, allow the basic comprehension of their algorithmic relationships. They are thus self-explanatory with respect to their mechanisms of action and the decisions they make. In the case of black-box models such as neural networks, it is usually no longer possible to understand the inner workings of the model due to their interconnectedness and multi-layered structure. However, at least for the explanation of individual decisions (local explainability), additional explanatory tools can be used in order to subsequently increase comprehensibility. Depending on the specific requirements, AI developers can fall back on established explanation tools, e.g. LIME, SHAP, Integrated Gradients, LRP, DeepLift or GradCAM, which, however, require expert knowledge. For mere users of AI, only few good tools exist so far that provide intuitively understandable decision explanations (saliency maps, counterfactual explanations, prototypes or surrogate models). The participants in the survey conducted as part of this study use popular representatives of white-box models (statistical/probabilistic models, decision trees) and black-box models (neural networks) to roughly the same extent today. In the future, however, according to the survey, a greater use of black-box models is expected, especially neural networks. This means that the importance of explanatory strategies will continue to increase in the future, while they are already an essential component of many AI applications today. The importance of explainability varies greatly depending on the industry. It is considered by far the most important in the healthcare sector, followed by the financial sector, the manufacturing sector, the construction industry and the process industry. Four use cases were analyzed in more detail through in-depth interviews with proven experts. The use cases comprise image analysis of histological tissue sections as well as text analysis of doctors' letters from the health care domain, machine condition monitoring in manufacturing, and AI-supported process control in the process industry. Among these, model explanations that make the model-internal mechanisms of action comprehensible (global explainability) are only indispensable for the process control case as a strict approval requirement. In the other use cases, local explainability is sufficient as a minimum requirement. Global explainability, however, plays a key role in the acceptance of AI-supported products in the considered use cases related to manufacturing industries. Furthermore, the use case analyses show that the selection of a suitable explanation strategy depends on the target groups, the data types used and the AI model used. The study analyzes the advantages and disadvantages of the established tools along these criteria and offers a corresponding decision support. Since white-box models are self-explanatory in terms of model action mechanisms and individual decisions, they should be preferred for all applications that place high demands on comprehensibility - whenever possible. Especially if they perform similarly well, or at least sufficiently well, compared to black-box models. It can be assumed that with the increasing use of AI in business, the need for reliable and intuitive explanation strategies will also increase significantly in the future. In order to meet this demand, the following technical and non-technical challenges currently need to be overcome: · New and further development of suitable "hybrid" approaches that combine data- and knowledge-driven approaches, or white- and black-box modelling approaches respectively. · Consideration of aspects from behavioural and cognitive science - such as the measurability of the quality of an explanation from the user's point of view, automated adaptations of explanations to users, explainability of holistic AI systems - in order to improve explainable AI systems · Definition of application and risk classes from which the basic necessity of an explanation for given use cases can be derived · Definition of uniform requirements for the explainability of AI and thus the creation of clear regulatory specifications and approval guidelines corresponding to the application and risk classes · Creation of approval and (re)certification frameworks for systems continuously learning during operational deployment · Provision and implementation of comprehensive education and training programs for examiners and inspectors to verify the explainability of AI.

Assessment of AUC and fold change for precision medicine

Preprint

Feb 2022

Precision medicine is advancing patient care for complex human diseases. Discovery of biomarkers to diagnose specific subtypes within a heterogeneous diseased population is a key step towards realizing the benefits of precision medicine. However, popular statistical methods for evaluating candidate biomarkers, fold change and AUC, were designed for homogeneous data and we evaluate their performance here. In general, these metrics overlook nearly ‘ideal’ biomarkers when they represent less than half of the diseased population. We introduce a new metric to address this shortfall and run a series of trials comprised of simulated and biological data.

A representation and deep learning model for annotating ubiquitylation sentences stating E3 ligase - substrate interaction

Article

Full-text available

Oct 2021
BMC BIOINFORMATICS

Background Ubiquitylation is an important post-translational modification of proteins that not only plays a central role in cellular coding, but is also closely associated with the development of a variety of diseases. The specific selection of substrate by ligase E3 is the key in ubiquitylation. As various high-throughput analytical techniques continue to be applied to the study of ubiquitylation, a large amount of ubiquitylation site data, and records of E3-substrate interactions continue to be generated. Biomedical literature is an important vehicle for information on E3-substrate interactions in ubiquitylation and related new discoveries, as well as an important channel for researchers to obtain such up to date data. The continuous explosion of ubiquitylation related literature poses a great challenge to researchers in acquiring and analyzing the information. Therefore, automatic annotation of these E3-substrate interaction sentences from the available literature is urgently needed. Results In this research, we proposed a model based on representation and attention mechanism based deep learning methods, to automatic annotate E3-substrate interaction sentences in biomedical literature. Focusing on the sentences with E3 protein inside, we applied several natural language processing methods and a Long Short-Term Memory (LSTM)-based deep learning classifier to train the model. Experimental results had proved the effectiveness of our proposed model. And also, the proposed attention mechanism deep learning method outperforms other statistical machine learning methods. We also created a manual corpus of E3-substrate interaction sentences, in which the E3 proteins and substrate proteins are also labeled, in order to construct our model. The corpus and model proposed by our research are definitely able to be very useful and valuable resource for advancement of ubiquitylation-related research. Conclusion Having the entire manual corpus of E3-substrate interaction sentences readily available in electronic form will greatly facilitate subsequent text mining and machine learning analyses. Automatic annotating ubiquitylation sentences stating E3 ligase-substrate interaction is significantly benefited from semantic representation and deep learning. The model enables rapid information accessing and can assist in further screening of key ubiquitylation ligase substrates for in-depth studies.

Associations of lipid accumulation product, visceral adiposity index, and triglyceride-glucose index with subclinical organ damage in healthy Chinese adults

Article

Full-text available

Sep 2023

Background and aims Obesity is an independent risk factor for cardiovascular disease development. Here, we aimed to examine and compare the predictive values of three novel obesity indices, lipid accumulation product (LAP), visceral adiposity index (VAI), and triglyceride-glucose (TyG) index, for cardiovascular subclinical organ damage. Methods A total of 1,773 healthy individuals from the Hanzhong Adolescent Hypertension Study cohort were enrolled. Anthropometric, biochemical, urinary albumin-to-creatinine ratio (uACR), brachial-ankle pulse wave velocity (baPWV), and Cornell voltage-duration product data were collected. Furthermore, the potential risk factors for subclinical organ damage were investigated, with particular emphasis on examining the predictive value of the LAP, VAI, and TyG index for detecting subclinical organ damage. Results LAP, VAI, and TyG index exhibited a significant positive association with baPWV and uACR. However, only LAP and VAI were found to have a positive correlation with Cornell product. While the three indices did not show an association with electrocardiographic left ventricular hypertrophy, higher values of LAP and TyG index were significantly associated with an increased risk of arterial stiffness and albuminuria. Furthermore, after dividing the population into quartiles, the fourth quartiles of LAP and TyG index showed a significant association with arterial stiffness and albuminuria when compared with the first quartiles, in both unadjusted and fully adjusted models. Additionally, the concordance index (C-index) values for LAP, VAI, and TyG index were reasonably high for arterial stiffness (0.856, 0.856, and 0.857, respectively) and albuminuria (0.739, 0.737, and 0.746, respectively). Lastly, the analyses of continuous net reclassification improvement (NRI) and integrated discrimination improvement (IDI) demonstrated that the TyG index exhibited significantly higher predictive values for arterial stiffness and albuminuria compared with LAP and VAI. Conclusion LAP, VAI, and, especially, TyG index demonstrated utility in screening cardiovascular subclinical organ damage among Chinese adults in this community-based sample. These indices have the potential to function as markers for early detection of cardiovascular disease in otherwise healthy individuals.

NTpred: a robust and precise machine learning framework for in silico identification of Tyrosine nitration sites in protein sequences

Article

May 2023

Post-translational modifications (PTMs) either enhance a protein's activity in various sub-cellular processes, or degrade their activity which leads toward failure of intracellular processes. Tyrosine nitration (NT) modification degrades protein's activity that initiates and propagates various diseases including neurodegenerative, cardiovascular, autoimmune diseases and carcinogenesis. Identification of NT modification supports development of novel therapies and drug discoveries for associated diseases. Identification of NT modification in biochemical labs is expensive, time consuming and error-prone. To supplement this process, several computational approaches have been proposed. However these approaches fail to precisely identify NT modification, due to the extraction of irrelevant, redundant and less discriminative features from protein sequences. This paper presents the NTpred framework that is competent in extracting comprehensive features from raw protein sequences using four different sequence encoders. To reap the benefits of different encoders, it generates four additional feature spaces by fusing different combinations of individual encodings. Furthermore, it eradicates irrelevant and redundant features from eight different feature spaces through a Recursive Feature Elimination process. Selected features of four individual encodings and four feature fusion vectors are used to train eight different Gradient Boosted Tree classifiers. The probability scores from the trained classifiers are utilized to generate a new probabilistic feature space, which is used to train a Logistic Regression classifier. On the BD1 benchmark dataset, the proposed framework outperforms the existing best-performing predictor in 5-fold cross validation and independent test evaluation with combined improvement of 13.7% in MCC and 20.1% in AUC. Similarly, on the BD2 benchmark dataset, the proposed framework outperforms the existing best-performing predictor with combined improvement of 5.3% in MCC and 1.0% in AUC. NTpred is publicly available for further experimentation and predictive use at: https://sds_genetic_analysis.opendfki.de/PredNTS/.

pyPheWAS Explorer: a visualization tool for exploratory analysis of phenome-disease associations

Article

Full-text available

Apr 2023

Objective To enable interactive visualization of phenome-wide association studies (PheWAS) on electronic health records (EHR). Materials and Methods Current PheWAS technologies require familiarity with command-line interfaces and lack end-to-end data visualizations. pyPheWAS Explorer allows users to examine group variables, test assumptions, design PheWAS models, and evaluate results in a streamlined graphical interface. Results A cohort of attention deficit hyperactivity disorder (ADHD) subjects and matched non-ADHD controls is examined. pyPheWAS Explorer is used to build a PheWAS model including sex and deprivation index as covariates, and the Explorer’s result visualization for this model reveals known ADHD comorbidities. Discussion pyPheWAS Explorer may be used to rapidly investigate potentially novel EHR associations. Broader applications include deployment for clinical experts and preliminary exploration tools for institutional EHR repositories. Conclusion pyPheWAS Explorer provides a seamless graphical interface for designing, executing, and analyzing PheWAS experiments, emphasizing exploratory analysis of regression types and covariate selection.

Nonparametric Estimation of Nonstationary Spatial Covariance Structure

Article

Full-text available

Mar 1992

Estimation of the covariance structure of spatial processes is a fundamental prerequisite for problems of spatial interpolation and the design of monitoring networks. We introduce a nonparametric approach to global estimation of the spatial covariance structure of a random function Z(x, t) observed repeatedly at times ti (i = 1, …, T) at a finite number of sampling stations xi (i = 1, 2, …, N) in the plane. Our analyses assume temporal stationarity but do not assume spatial stationarity (or isotropy). We analyze the spatial dispersions var(Z(xi, t) − Z(xj, t)) as a natural metric for the spatial covariance structure and model these as a general smooth function of the geographic coordinates of station pairs (xi, xj). The model is constructed in two steps. First, using nonmetric multidimensional scaling (MDS) we compute a two-dimensional representation of the sampling stations for which a monotone function of interpoint distances δij approximates the spatial dispersions. MDS transforms the problem into one for which the covariance structure, expressed in terms of spatial dispersions, is stationary and isotropic. Second, we compute thin-plate splines to provide smooth mappings of the geographic representation of the sampling stations into their MDS representation. The composition of this mapping f and a monotone function g derived from MDS yields a nonparametric estimator of var(Z(xa, t) − Z(xb, t)) for any two geographic locations xa and xb (monitored or not) of the form g(|f(xa) − f(xb)|). By restricting the monotone function g to a class of conditionally nonpositive definite variogram functions, we ensure that the resulting nonparametric model corresponds to a nonnegative definite covariance model. We use biorthogonal grids, introduced by Bookstein in the field of morphometrics, to depict the thin-plate spline mappings that embody the nature of the anisotropy and nonstationarity in the sample covariance matrix. An analysis of mesoscale variability in solar radiation monitored in southwestern British Columbia demonstrates this methodology.

Multivariate Observation

Book

Jan 1984

George Seber

Introduction Multivariate Normal Distribution Wishart Distribution Hotelling's T2 Distribution Multivariate Beta Distributions Rao's Distribution Multivariate Skewness and Kurtosis

Order Restricted Statistical Inference

Article

Jun 1990

Logistic Regression Methods for Retrospective Case-Control Studies Using Complex Sampling Procedures

Article

Jan 1987

There are a number of possible designs for case-control studies. The simplest uses two separate simple random samples, but an actual study may use more complex sampling procedures. Typically, stratification is used to control for the effects of one or more risk factors in which we are interested. It has been shown (Anderson, 1972, Biometrika 59, 19-35; Prentice and Pyke, 1979, Biometrika 66, 403-411) that the unconditional logistic regression estimators apply under stratified sampling, so long as the logistic model includes a term for each stratum. We consider the case-control problem with stratified samples and assume a logistic model that does not include terms for strata, i.e., for fixed covariates the (prospective) probability of disease does not depend on stratum. We assume knowledge of the proportion sampled in each stratum as well as the total number in the stratum. We use this knowledge to obtain the maximum likelihood estimators for all parameters in the logistic model including those for variables completely associated with strata. The approach may also be applied to obtain estimators under probability sampling.

Applied Logistic Regression

Abstract

Recommended publications

A comparison of absolute correlation coefficient between coefficient of determination and index of p...

Ordinal Logistic Modeling Using ICOMP as a Goodness-of-Fit Criterion

Confidence interval estimates of an index of quality performance based on logistic regression models

Residuals and Diagnostics for Binary and Ordinal Regression Models: An Introduction to the sure Pack...