illustrates the notions of nodes, terminal nodes and depth.

Source publication

Classification tree algorithm for grouped variables

Article

Full-text available

Dec 2019

We consider the problem of predicting a categorical variable based on groups of inputs. Some methods have already been proposed to elaborate classification rules based on groups of variables (e.g. group lasso for logistic regression). However, to our knowledge, no tree-based approach has been proposed to tackle this issue. Here, we propose the Tree...

Context 1

... other words, T h is the deeper subtree of T max whose terminal nodes have a depth less than or equal to h. For example, Table 1 gives the terminal nodes for the sequence of subtrees of the tree displayed in Figure 1. ...

View in full-text

Context 2

... 8 , t 9 , t 5 , t 10 , t 11 , t 7 Table 1: Terminal nodes for the subtrees in Figure 1. ...

View in full-text

Context 3

... TPLDA allows to elaborate a less complex partition of the input space without lost of accuracy ( Figure 2). The associated trees are displayed in Appendix B (see Figure 13). ...

View in full-text

Context 4

... far as concerning the importance score, the use of a penalized Gini criterion seems to improve the ability of the importance score to identify the true relevant group (Table 5 and Figures 10 and 11). Nonetheless, it seems that the large noisy group still appears in the five most important groups, even when adding a penalty function to the Gini impurity criterion. ...

View in full-text

Context 5

... points out that the group importance score might be not reliable for estimating the importance of large noisy groups when some noisy groups are very large compared to the true relevant groups. This lack of robustness is also highlighted by the high variability of the estimate of the importance of the large noisy group (Figures 10 and 11). So, when group sizes vary greatly, the estimated values of the importance score for the largest groups should be considered with caution. ...

View in full-text

Context 6

... score of importance gives information about the relative importance of each group and also allows to perform selection of groups of genes. Figure 12 displays the measure of importance of each group, on the leukemia data. The measure of importance enables to highlight four groups: the first, the third, the sixth and the fifteenth groups. ...

View in full-text

Context 7

... Additional figures about the illustration of the TPLDA method on a simple example Figure 13 displays the two trees built by CART and TPLDA in the simple example used to illustrate TPLDA in Section 2.4. As mentioned previously, in this example, the TPLDA tree is much easier than the CART tree. ...

View in full-text

Context 8

... in the first split, the FDA used to split the entire data space overfits the training set. This can be seen in Figure 14: the training misclassification error decreases much faster for TLDA and becomes smaller than the Bayes error from the first split while the test misclassification error for TLDA remains stable. Consequently, after applying the pruning procedure which removes the less informative nodes, the final TLDA tree is trivial in at least 25 % of the simulations (Table 9). ...

View in full-text

Context 9

... results are given in Table 10. Figure 15 displays the group selection frequencies of CART and CARTD. Overall, the two pruning methods lead to similar CART trees and so similar classification rules. ...

View in full-text

Context 10

... figures about the simulation studies are displayed in this subsection. Figure 16 displays the predictive performances of TPLDA, CART and GL in the first three experiments. Figure 17 shows the distribution of the importance score for each group in the first three experiments. ...

View in full-text

Context 11

... 16 displays the predictive performances of TPLDA, CART and GL in the first three experiments. Figure 17 shows the distribution of the importance score for each group in the first three experiments. Figure 10 and Figure 11 display the distribution of the importance score for each group in the fourth and the fifth experiments according to the penalty function. ...

View in full-text

Context 12

... 17 shows the distribution of the importance score for each group in the first three experiments. Figure 10 and Figure 11 display the distribution of the importance score for each group in the fourth and the fifth experiments according to the penalty function. ...

View in full-text

Context 13

View in full-text

New Morphological Features Based on the Sholl Analysis for Automatic Classification of Traced Neurons

Article

Full-text available

Jun 2020

Background This article addresses the automatic classification of reconstructed neurons through their morphological features. The purpose was to extend the capabilities of the L-Measure software. Methods New morphological features were developed, based on modifications of the conventional Sholl analysis. The lengths of the compartments, as well as...

Classification of multivariate functional data on different domains with Partial Least Squares approaches

Article

Full-text available

Oct 2023
STAT COMPUT

Classification (supervised-learning) of multivariate functional data is considered when the elements of the random functional vector of interest are defined on different domains. In this setting, PLS classification and tree PLS-based methods for multivariate functional data are presented. From a computational point of view, we show that the PLS components of the regression with multivariate functional data can be obtained using only the PLS methodology with univariate functional data. This offers an alternative way to present the PLS algorithm for multivariate functional data. Numerical simulation and real data applications highlight the performance of the proposed methods.

Unraveling the Significance of the Classification Tree Algorithm in Machine Learning: A Literature Review

Article

Full-text available

Sep 2023

Machine learning, an integral component of Artificial Intelligence (AI), empowers systems to autonomously enhance their performance through experiential learning. This paper presents a comprehensive overview of the Classification Tree Algorithm's pivotal role in the realm of machine learning. This algorithm simplifies the process of categorizing new instances into predefined classes, leveraging their unique attributes. It has firmly established itself as a cornerstone within the broader landscape of classification techniques. This paper delves into the multifaceted concepts, terminologies, principles, and ideas that orbit the Classification Tree Algorithm. It sheds light on the algorithm's essence, providing readers with a clearer and more profound understanding of its inner workings. By synthesizing a plethora of existing research, this endeavor contributes to the enrichment of the discourse surrounding classification tree algorithms. In summary, the Classification Tree Algorithm plays a fundamental role in machine learning, facilitating data classification, and empowering decision-making across domains. Its adaptability, alongside emerging variations and innovative techniques, ensures its continued relevance in the ever-evolving landscape of artificial intelligence and data analysis.

Classification of multivariate functional data on different domains with Partial Least Squares approaches

Preprint

Full-text available

Apr 2023

Classification (supervised-learning) of multivariate functional data isconsidered when the elements of the random functional vector of interest are defined on different domains. In this setting, PLS classification and tree PLS-based methods for multivariate functional data are presented.From a computational point of view, we show that the PLS components of the regression with multivariate functional data can be obtained using only the PLS methodology with univariate functional data.This offers an alternative way to present the PLS algorithm for multivariate functional data. Numerical simulation and real data applications highlight the performance of the proposed methods.

Classification of multivariate functional data on different domains with Partial Least Squares approaches

Preprint

Dec 2022

Classification of multivariate functional data is explored in this paper, particularly for functional data defined on different domains. Using the partial least squares (PLS) regression, we propose two classification methods. The first one uses the equivalence between linear discriminant analysis and linear regression. The second is a decision tree based on the first technique. Moreover, we prove that multivariate PLS components can be estimated using univariate PLS components. This offers an alternative way to calculate PLS for multivariate functional data. Finite sample studies on simulated data and real data applications show that our algorithms are competitive with linear discriminant on principal components scores and black-boxes models.

Data-Enabled Physics-Informed Machine Learning for Reduced-Order Modeling Digital Twin: Application to Nuclear Reactor Physics

Article

Full-text available

Feb 2022
NUCL SCI ENG

This paper proposes an approach that combines reduced-order models with machine learning in order to create physics-informed digital twins to predict high-dimensional output quantities of interest, such as neutron flux and power distributions in nuclear reactor cores. The digital twin is designed to solve forward problems given input parameters, as well as to solve inverse problems given some extra measurements. Offline, we use reduced-order modeling, namely, the proper orthogonal decomposition (POD) to assemble physics-based computational models that are accurate enough for fast predictive digital twin. The machine learning techniques, namely, k-nearest-neighbors (KNN) and decision trees (DT) are used to formulate the input-parameter-dependent coefficients of the reduced basis, whereafter the high-fidelity fields are able to be reconstructed. Online, we use the real time input parameters to rapidly reconstruct the neutron field in the core based on the adapted physics-based digital twin. The effectiveness of the framework is illustrated through a real engineering problem in nuclear reactor physics - reactor core simulation in the life cycle of HPR1000 governed by the two-group neutron diffusion equations affected by input parameters, i.e., burnup, control rod inserting step, power level and temperature of the coolant, which shows potential applications for on-line monitoring purpose.

Trees, forests, and impurity-based variable importance in regression

Article

Feb 2023
ANN I H POINCARE-PR

Erwan Scornet

Interaction forests: Identifying and exploiting interpretable quantitative and qualitative interaction effects

Article

Mar 2022
COMPUT STAT DATA AN

Although interaction effects can be exploited to improve predictions and allow for valuable insights into covariate interplay, they are given limited attention in analysis. Interaction forests are a variant of random forests for categorical, continuous, and survival outcomes that explicitly models quantitative and qualitative interaction effects in bivariable splits performed by the trees constituting the forests. The new effect importance measure (EIM) associated with interaction forests allows for ranking of covariate pairs with respect to their interaction effects' importance to prediction. Using EIM, separate importance value lists for univariable effects, quantitative interaction effects, and qualitative interaction effects are obtained. In the spirit of interpretable machine learning, the bivariable split types of interaction forests target easily interpretable and communicable interaction effects. To learn about the nature of the interplay between covariates identified as interacting it is convenient to visualise their estimated bivariable influence. Functions that perform this task are provided in the R package diversityForest, which implements interaction forests. In a large-scale empirical study using 220 data sets, interaction forests tended to deliver better predictions than conventional random forests and competing random forest variants that use multivariable splitting. In a simulation study, EIM delivered considerably better rankings for the relevant quantitative and qualitative interaction effects than competing approaches. These results indicate that interaction forests are suitable tools for the challenging task of identifying and making use of easily interpretable and communicable interaction effects in predictive modelling.

Over-indebted Households in Poland: Classification Tree Analysis

Article

Full-text available

Jan 2021
SOC INDIC RES

Increasing a personal debt burden implies greater financial vulnerability and threats for macroeconomic stability. It also generates a risk of the households over-indebtedness. The assessment of over-indebtedness is conducted with the use of various objective and subjective measures based on the micro-level data. The aim of the study is to investigate over-indebted households in Poland using a unique dataset obtained from the CATI survey. We discuss and compare the usefulness of various over-indebtedness measures across different socio-economic characteristics. Due to the differences in over-indebtedness across single measures, we perform a more complex assessment using a mix of indicators. As an alternative to other commonly criticised over-indebtedness measures, we apply the “below the poverty line” (BPL) measure. In order to obtain the profile of over-indebted households, we use classification and regression tree analysis as an alternative to logit or probit models. We find that DSTI (“debt service to income”) ratio underestimates the extent of over-indebtedness in vulnerable groups of households in comparison with the BPL. We highlight the necessity to use different measures depending on the adopted definition of over-indebtedness. A psychological burden of debts is particularly strong among older and poorly educated respondents. We also find that the age structure of over-indebted households in Poland differs from this structure in countries with a broader access to consumer credits. Our results can be used to enrich the methods of assessing the household over-indebtedness.

illustrates the notions of nodes, terminal nodes and depth.

Contexts in source publication

Similar publications

Citations