Figure 6 - uploaded by Alaa Tharwat
Content may be subject to copyright.
The calculated decision boundaries for three different classes where their covariance matrices were equal but arbitrary (our example in Section 3.2) (see online version for colours) 

The calculated decision boundaries for three different classes where their covariance matrices were equal but arbitrary (our example in Section 3.2) (see online version for colours) 

Source publication
Article
Full-text available
The aim of this paper is to collect in one place the basic background needed to understand the discriminant analysis (DA) classifier to make the reader of all levels be able to get a better understanding of the DA and to know how to apply this classifier in different applications. This paper starts with basic mathematical definitions of the DA step...

Contexts in source publication

Context 1
... code for this experiment is introduced in Appendix. Given three different classes denoted by, ω 1 , ω 2 , ω 3 as shown in Figure 6. Each class consists of four samples, and each sample was represented by two features, x 1 and x 2 as shown in Table 2. Values of the mean of each class, mean-centring data, and the covariance matrices are summarised in Table 2. ...
Context 2
... discriminant functions were then calculated and their values will be as follows. The decision boundaries between each two classes were then calculated as follows: ( 3 5 ) Figure 6 shows graphically the original data of this example, which consists of three classes, the decision boundaries between all classes, the region for each class, and the distributions of all classes. Moreover, Figure 7 shows the decision functions in threedimensional space. ...

Similar publications

Conference Paper
Full-text available
This research study proposes a compatible encoder-enabled video generating method. The encoder-enabled method adds an inference mechanism for enhancing the ability of Generative Adversarial Networks (GAN) based video generators. The proposed video generating method is called Encoding GAN3 (EncGAN3) and decomposes the video into two streams represen...

Citations

... We also explored other classifiers with non-linear boundaries. They are: (1) the decision tree classifier with max-depth k, denoted as DT-k, and DT for decision tree with unlimited layers; (2) the k-nearest neighbor classifier, denoted as k-NN; (3) another SVM with non-linear kernel k, denoted as SVM-k; and (4) a quadratic discriminative analysis (QDA) classifier [54]. We show their G-Mean and accuracy in light blue and orange, respectively. ...
... In finance, it underpins models such as the Markowitz portfolio selection [28,30] and the recently developed flow trading model in market design [6]. Machine learning applications of QP include support vector machines (SVM) [21], Elastic Net [38], and Quadratic Discriminant Analysis (QDA) [34]. Additionally, QP is a fundamental component in many optimization algorithms used in nonlinear programming, such as sequential quadratic programming (SQP) [4], where it helps approximate nonlinear objectives with quadratic functions. ...
Preprint
Convex quadratic programming (QP) is an essential class of optimization problems with broad applications across various fields. Traditional QP solvers, typically based on simplex or barrier methods, face significant scalability challenges. In response to these limitations, recent research has shifted towards matrix-free first-order methods to enhance scalability in QP. Among these, the restarted accelerated primal-dual hybrid gradient (rAPDHG) method, proposed by H.Lu(2023), has gained notable attention due to its linear convergence rate to an optimal solution and its straightforward implementation on Graphics Processing Units (GPUs). Building on this framework, this paper introduces a restarted primal-dual hybrid conjugate gradient (PDHCG) method, which incorporates conjugate gradient (CG) techniques to address the primal subproblems inexactly. We demonstrate that PDHCG maintains a linear convergence rate with an improved convergence constant and is also straightforward to implement on GPUs. Extensive numerical experiments affirm that, compared to rAPDHG, our method could significantly reduce the number of iterations required to achieve the desired accuracy and offer a substantial performance improvement in large-scale problems. These findings highlight the significant potential of our proposed PDHCG method to boost both the efficiency and scalability of solving complex QP challenges.
... Discriminant analysis was originally introduced by R. Fisher and widely used for classification problems and predictions (134). The method belongs to supervised machine learning techniques and a more precise description could be found from the literature (137). In the case of discriminant analysis, there is an assumption that group distributions used for classification follow a Gaussian distribution. ...
... For a more precise description of the calculations, we refer the reader to appropriate literature sources (137,138). ...
... The red distribution denotes performances of world class athletes, the green distribution represents performances of international level athletes, and light blue denotes performances of national level athletes. Vertical lines denote boundaries between the classes (World class versus International, International versus National) calculated by means of quadratic discriminant analysis (QDAn)(137). ...
Thesis
Full-text available
The true prevalence of doping use in elite sports is unknown, but studies provide estimates ranging from 14% to 39%. According to WADA statistics, only approximately 2% of the doping samples collected annually are reported to contain banned substances. This figure remains relatively stable, despite the gradual increase in both the number of tests conducted and the development of the analytical methods. On this basis, a significant discrepancy still exists between the estimated prevalence of doping and the number of confirmed positive doping cases. This fact underscores the need for investment in anti-doping research that can enhance the effectiveness of the anti-doping system. One of the significant developments in anti-doping in recent decades was the introduction of the Athlete Biological Passport (ABP). The ABP relies on Bayesian statistics to establish personalised ranges for blood parameters and tracks an athlete in a longitudinal manner. Since doping triggers biological changes, its use can be indirectly detected by monitoring the biological parameters of the ABP over time. This approach has proven to be considerably more sensitive to detect doping use and has extended the window for detecting the use of banned substances. Additionally, this longitudinal approach has enhanced the efficiency of targeted testing using conventional methods. Given that the primary goal of doping is to enhance an athlete’s performance, it also seems reasonable to implement a systematic monitoring of an athlete’s individual competition results. This could help to identify any unusual or disproportionate performances and, if necessary, initiate anti-doping actions. The concept of profiling individual performances to make more informed decisions about doping testing was originally proposed by Schumacher and Pottigiesser. With this approach, athletes with suspicious performances could be identified, as doping not only alters their biological passport parameters, but also ultimately leads to an improvement in their performance. This dissertation is dedicated to the topic of performance profiling for anti-doping needs and beyond. In this work, the principles of performance profiling for anti-doping purposes are discussed, and research that provides insight into the potential of this approach in the future is presented. This research establishes the evidence-based foundation for further development of the performance profiling for anti-doping needs.
... Also, quadratic discriminant analysis (QDA) has been adopted by Alimardani et al. [37]. While QDA and LDA function similarly, QDA classifies the two populations assuming that the covariance matrix between classes could vary [66]. Erguzel et al. [45] used an ANN. ...
... Thirteen different classification models, Gradient Boosting Classifier (gbc) [27], Light Gradient Boosting Machine (lgb) [28], Random Forest Classifier (rf) [29], Extra Trees Classifier (et) [30], K Neighbors Classifier (knn) [31], Extreme Gradient Boosting (xgb) [32], Linear Discriminant Analysis (lda) [33], Ada Boost Classifier (ada) [34], Decision Tree Classifier (dt) [35], Naive Bayes (nb) [36], Quadratic Discriminant Analysis (qda) [37], Logistic Regression (lr) [38], and Dummy Classifier (dum) [39], a classification model that does not learn anything from the training data but is particularly useful for assessing the performance of more complex models and understanding the difficulty of the classification task, were trained on the whole dataset in a 10-stratified cross-validation. ...
Article
Full-text available
Respiratory malignancies, encompassing cancers affecting the lungs, the trachea, and the bronchi, pose a significant and dynamic public health challenge. Given that air pollution stands as a significant contributor to the onset of these ailments, discerning the most detrimental agents becomes imperative for crafting policies aimed at mitigating exposure. This study advocates for the utilization of explainable artificial intelligence (XAI) methodologies, leveraging remote sensing data, to ascertain the primary influencers on the prediction of standard mortality rates (SMRs) attributable to respiratory cancer across Italian provinces, utilizing both environmental and socioeconomic data. By scrutinizing thirteen distinct machine learning algorithms, we endeavor to pinpoint the most accurate model for categorizing Italian provinces as either above or below the national average SMR value for respiratory cancer. Furthermore, employing XAI techniques, we delineate the salient factors crucial in predicting the two classes of SMR. Through our machine learning scrutiny, we illuminate the environmental and socioeconomic factors pertinent to mortality in this disease category, thereby offering a roadmap for prioritizing interventions aimed at mitigating risk factors.
... Unlike QDA, LDA assumes equal covariance matrices between classes. In addition, the LDA has a linear decision surface, while the QDA has a nonlinear decision surface [15,16]. The SVM calculates hyperplanes to discriminate different classes, both linearly and non-linearly, by combining with the kernel function [17]. ...
... QDA and SVM are both classifiers that are not based on a linear decision surface like LDA. QDA can be based on a quadratic function or curve, while SVM is based on several kernel functions (e.g., polynomial, radial, etc.) [15,17]. A different characteristic of multi-class applications between SVM and LDA/QDA is that LDA/QDA computation focuses on very different classes, while SVM computation focuses on closer classes [25]. ...
Article
Full-text available
(1) Background: The authenticity of eggs in relation to the housing system of laying hens is susceptible to food fraud due to the potential for egg mislabeling. (2) Methods: A total of 4188 egg yolks, obtained from four different breeds of laying hens housed in colony cage, barn, free-range, and organic systems, were analyzed using 1H NMR spectroscopy. The data of the resulting 1H NMR spectra were used for different machine learning methods to build classification models for the four housing systems. (3) Results: The comparison of the seven computed models showed that the support vector machine (SVM) model gave the best results with a cross-validation accuracy of 98.5%. The test of classification models with eggs from supermarkets showed that only a maximum of 62.8% of samples were classified according to the housing system labeled on the eggs. (4) Conclusion: The classification models developed in this study included the largest sample size compared to the literature. The SVM model is most suitable for evaluating 1H NMR data in terms of the hen housing system. The test with supermarket samples showed that more authentic samples to analyze influencing factors such as breed, feeding, and housing changes are required.
... In this experiment, we attempted to compare classification performance using various well-known classification methods as follows. Quadratic Discriminant Analysis (QDA) (Tharwat 2016), Kernel Support Vector Machine (k-SVM) (Guenther and Schonlau 2016) applying RBF kernel function, Neural Networks (NN) (Gurney 2018) having a single hidden layer in which the number of hidden nodes is half the total number of variables, Random Forest (RF) (Breiman 2001) with 100 trees, and eXtreme Gradient Boosting (XGBoost) (Chen and Guestrin 2016) with 100 boosting iterations are applied as classification models to the d-dimensional training data. The hyper-parameters of the classification methods are set to the same values to compare each dimension reduction method under identical conditions. ...
Article
Full-text available
This study proposes a new linear dimension reduction technique called Maximizing Adjusted Covariance (MAC), which is suitable for supervised classification. The new approach is to adjust the covariance matrix between input and target variables using the within-class sum of squares, thereby promoting class separation after linear dimension reduction. MAC has a low computational cost and can complement existing linear dimensionality reduction techniques for classification. In this study, the classification performance by MAC was compared with those of the existing linear dimension reduction methods using 44 datasets. In most of the classification models used in the experiment, the MAC dimension reduction method showed better classification accuracy and F1 score than other linear dimension reduction methods.
... Also, the EXT and XGB algorithms have high computational costs due to adding additional randomness to the model while growing the trees. QDA, a statistical method, models the distribution of the features in each class using a Gaussian distribution with its own mean and covariance matrix [27]. As a result, its computational complexity gets intensive due to inverting multiple matrices and the requirements for more parameters to be estimated, especially in the context of financial datasets with many features. ...
... The STP framework utilizes a diverse set of primary classifiers, each possessing distinct attributes and capabilities, to guarantee a thorough examination of financial data. A review of these methods is as follows [7], [15], [27], [32]. ...
... Discriminant analysis is one of the simplest supervised classifiers used in clustering and classification problems. There are two main types of DA classifiers: LDA and quadratic discriminant analysis (Tharwat, 2016). When the number of variables in the data set is much higher than the number of samples for each class, regularized discriminant functions are preferred (Tharwat, 2016;Wu et al., 1996). ...
... There are two main types of DA classifiers: LDA and quadratic discriminant analysis (Tharwat, 2016). When the number of variables in the data set is much higher than the number of samples for each class, regularized discriminant functions are preferred (Tharwat, 2016;Wu et al., 1996). ...
Article
Full-text available
Sea level rise and storm surges drive coastal forest retreat and salt marsh expansion. Both salinization and flooding control ecological zonation and ecosystem transition in coastal areas. Hydrological variables, if coupled with ecological surveys, can explain the different stages of coastal forest retreat and marsh encroachment. In this research, long‐term data of a host of hydrological variables collected along transects from marsh to inner forest were analyzed. Linear discriminant analysis (LDA) was used to identify the primary hydrological variables responsible for the forest‐marsh gradient and their seasonal patterns. Water content (WC) in the soil (WC) and groundwater electrical conductivity (EC) were found to be the main variables responsible for the hydrological differences among the sites. Higher values of WC and EC were found in the low‐forest area near the salt marsh, with hydrological differences between forest levels reflected in ecological community structure. In particular, some sites were characterized by high EC while others by high WC values, suggesting significant spatial variations within hundreds of meters. The forested area, relatively flat in elevation, was characterized by limited hydraulic gradients and consequently lateral discharges. These characteristics made the role of groundwater level negligible in driving the hydrological clustering. Seasonal LDA data suggest that the sites are hydrologically different during winter (higher distance among clusters of variables) and similar during summer (low distance among clusters). In the study area, higher rainfall occurs during summer, decreasing groundwater EC in areas characterized by low canopy cover (dying forest). Rainfall moved low forest sites closer to the pristine high forest in the LDA analysis. During storm surge events, the distance between clusters decreased, indicating uniform salinization and flooding across the forest. Therefore, we conclude that ecological zonation in a coastal forest is reflected in seasonal hydrological differences in the absence of storm surges. Storm surges do not produce contrasting hydrological conditions and might not be responsible for ecological differences in the short‐term. On the contrary, differences in hydrological recovery are responsible for forest zonation. An additional analysis carried out using a binary Marsh‐Healthy forest LDA classifier indicates when each site switches from a forest hydrological state to a salt‐marsh hydrological state. Our results are useful for long‐term predictions of the ecological evolution of the forest–salt marsh ecotone.
... Here, feature vectors are assumed to have Gaussian distribution. Linear discriminant analysis presumes a linear relationship and equal covariance matrices for each class whereas, quadratic discriminant analysis presumes non-linear relationship and separate covariance matrices for each class to define decision boundaries (Tharwat, 2016). Both types of models were deployed in this investigation. ...
Article
Geotechnical characterisation of spoil piles has traditionally relied on the expertise of field specialists, which can be both hazardous and time-consuming. Although unmanned aerial vehicles (UAV) show promise as a remote sensing tool in various applications; accurately segmenting and classifying very high-resolution remote sensing images of heterogeneous terrains, such as mining spoil piles with irregular morphologies, presents significant challenges. The proposed method adopts a robust approach that combines morphology-based segmentation, as well as spectral, textural, structural, and statistical feature extraction techniques to overcome the difficulties associated with spoil pile characterisation. Additionally, it incorporates minimum redundancy maximum relevance (mRMR) based feature selection and machine learning-based classification. This automated characterisation will serve as a proactive tool for dump stability assessment, providing crucial data for improved stability models and contributing to a greener and more responsible mining industry.