Figure - uploaded by Llibertat Tusell
Content may be subject to copyright.
Number of records and generations included in the different types of training (trn) and testing sets (tst).

Number of records and generations included in the different types of training (trn) and testing sets (tst).

Source publication
Article
Full-text available
This research assessed the ability of a Support Vector Machine (SVM) regression model to predict pig crossbred (CB) performance from various sources of phenotypic and genotypic information for improving crossbreeding performance at reduced genotyping cost. Data consisted of average daily gain (ADG) and residual feed intake (RFI) records and genotyp...

Context in source publication

Context 1
... were assigned to a generation using the pedigree R package (Coster, 2013) using their pedigree information. Table 2 shows the amount of records and the number of generations available in the training and testing sets. Notice that because of data were split by generation, only a single prediction per scenario was obtained (i.e., no cross-validation was performed). ...

Citations

... To optimize the selection procedure of pseudo-quantitative trait nucleotides, the kinship-adjusted multiple-loci method was employed to incorporate cross-validation, multiple regression, grid search, and bisection algorithms (Yin et al., 2020). Support vector machine regression combined different sources of phenotypic and genotypic data to evaluate pig crossbreed performance (Tusell et al., 2020). Moreover, it was found that for pig reproduction traits, support vector regression, kernel ridge regression, random forest, and Adaboost.R2 all outperformed GBLUP, singlestep GBLUP, and Bayesian methods (Wang et al., 2022). ...
Article
Full-text available
Genetic improvement of complex traits in animal and plant breeding depends on the efficient and accurate estimation of breeding values. Deep learning methods have been shown to be not superior over traditional genomic selection (GS) methods, partially due to the degradation problem (i.e. with the increase of the model depth, the performance of the deeper model deteriorates). Since the deep learning method residual network (ResNet) is designed to solve gradient degradation, we examined its performance and factors related to its prediction accuracy in GS. Here we compared the prediction accuracy of conventional genomic best linear unbiased prediction, Bayesian methods (BayesA, BayesB, BayesC, and Bayesian Lasso), and two deep learning methods, convolutional neural network and ResNet, on three datasets (wheat, simulated and real pig data). ResNet outperformed other methods in both Pearson's correlation coefficient (PCC) and mean squared error (MSE) on the wheat and simulated data. For the pig backfat depth trait, ResNet still had the lowest MSE, whereas Bayesian Lasso had the highest PCC. We further clustered the pig data into four groups and, on one separated group, ResNet had the highest prediction accuracy (both PCC and MSE). Transfer learning was adopted and capable of enhancing the performance of both convolutional neural network and ResNet. Taken together, our findings indicate that ResNet could improve GS prediction accuracy, affected potentially by factors such as the genetic architecture of complex traits, data volume, and heterogeneity.
... Of these, 23,070 SNPs are used routinely by Topigs Norsvin and constitute the raw SNP data in the current study. Based on PB animals (n = 4,014), it was observed that from five different PBs (see details in the following section), the minimum call rate was 0.997 and minor allele frequency (MAF) was 0.045, well inside the limits used by Tusell et al. (2020) at 0.9 for call rate and 0.01 for MAF. ...
Article
Full-text available
In pig production, the production animals are generally three- or four-way crossbreeds. Reliable information regarding the breed of origin of slaughtered pigs is useful, even a prerequisite, for a number of purposes, e.g., evaluating potential breed effects on carcass grading. Genetic data from slaughtered pigs can easily be extracted and used for crossbreed classification. In the current study, four classification methods, namely, random forest (RF), ADMIXTURE, partial least squares regression (PLSR), and partial least squares together with quadratic discriminant analysis (PLS-QDA) were evaluated on simulated (n = 7,500) genomic data of crossbreeds. The derivation of the theory behind PLS-QDA is a major part of the current study, whereas RF and ADMIXTURE are known and well-described in the literature. Classification success (CS) rate, square loss (SL), and Kullback–Leibler (KL) divergence loss for the simulated data were used to compare methods. Overall, PLS-QDA performed best with 99%/0.0018/0.002 (CS/SL/KL) vs. 97%/0.0084/0.051, 97%/0.0087/0.0623, and 17%/0.068/0.39 for PLSR, ADMIXTURE, and RF, respectively. PLS-QDA and ADMIXTURE, as the most relevant methods, were used on a real dataset (n = 1,013) from Norway where the two largest classes contained 532 and 192 (PLS-QDA), and 531 and 193 (ADMIXTURE) individuals, respectively. These two classes were expected to be dominating a priori. The Bayesian nature of PLS-QDA enables inclusion of desirable features such as a separate class “unknown breed combination” and informative priors for crossbreeds, making this a preferable method for the classification of breed combination in the industry.
... For the prediction of RFI, ML methods have been reported to be suitable when subsets of the most informative SNPs are used as predictor variables. Tusell et al. (2020) and Piles et al. (2021) predicted FE using different sources of phenotypic and genotypic information as well as different algorithms for SNP selection. In these studies, as well as in most genomic selection research, predictive models only use single phenotypes, building an independent model for each target variable and ignoring the relationship among them. ...
... However, to the best of our knowledge, no study has explored the benefits of multi-output regression methods to predict RFI from the genotype until now. In the literature, some studies on the prediction of RFI using ML algorithms have been presented (Piles et al., 2021;Tusell et al., 2020;Yao et al., 2016). Tusell F I G U R E 6 Boxplot of the three ranking metrics (Spearman correlation, zero-one loss, and rank distance loss) between observed and predicted values of daily feed intake (DFI), average daily gain (ADG), backfat thickness (BFT), and metabolic weight (MW) with the multioutput method using random forest and different subsets sizes of single-nucleotide polymorphisms (SNPs) as predictor variables. ...
... et al. (2020) and Piles et al. (2021) predicted this trait in a single-output model from the genotype using different sources of information on a population of pigs. In these studies, the highest prediction performance of RFI, in terms of Spearman correlation, was 0.34 with SVR and 50 SNPs (Tusell et al., 2020). To improve this prediction performance, the benefits of multi-output and stacking methods were explored in the present research using the same population of pigs. ...
Article
Feeding represents the largest economic cost in meat production; therefore, selection to improve traits related to feed efficiency is a goal in most livestock breeding programs. Residual feed intake (RFI), that is, the difference between the actual and the expected feed intake based on animal's requirements, has been used as the selection criteria to improve feed efficiency since it was proposed by Kotch in 1963. In growing pigs, it is computed as the residual of the multiple regression model of daily feed intake (DFI), on average daily gain (ADG), backfat thickness (BFT), and metabolic body weight (MW). Recently, prediction using single-output machine learning algorithms and information from SNPs as predictor variables have been proposed for genomic selection in growing pigs, but like in other species, the prediction quality achieved for RFI has been generally poor. However, it has been suggested that it could be improved through multi-output or stacking methods. For this purpose, four strategies were implemented to predict RFI. Two of them correspond to the computation of RFI in an indirect way using the predicted values of its components obtained from (i) individual (multiple single-output strategy) or (ii) simultaneous predictions (multi-output strategy). The other two correspond to the direct prediction of RFI using (iii) the individual predictions of its components as predictor variables jointly with the genotype (stacking strategy), or (iv) using only the genotypes as predictors of RFI (single-output strategy). The single-output strategy was considered the benchmark. This research aimed to test the former three hypotheses using data recorded from 5828 growing pigs and 45,610 SNPs. For all the strategies two different learning methods were fitted: random forest (RF) and support vector regression (SVR). A nested cross-validation (CV) with an outer 10-folds CV and an inner threefold CV for hyperparameter tuning was implemented to test all strategies. This scheme was repeated using as predictor variables different subsets with an increasing number (from 200 to 3000) of the most informative SNPs identified with RF. Results showed that the highest prediction performance was achieved with 1000 SNPs, although the stability of feature selection was poor (0.13 points out of 1). For all SNP subsets, the benchmark showed the best prediction performance. Using the RF as a learner and the 1000 most informative SNPs as predictors, the mean (SD) of the 10 values obtained in the test sets were: 0.23 (0.04) for the Spearman correlation, 0.83 (0.04) for the zero-one loss, and 0.33 (0.03) for the rank distance loss. We conclude that the information on predicted components of RFI (DFI, ADG, MW, and BFT) does not contribute to improve the quality of the prediction of this trait in relation to the one obtained with the single-output strategy.
... Thermal imaging is increasingly utilized in animal welfare to increase farm pig growth and health [20]. Deep learning models can enable farmers to identify breeding pairs that are likely to generate high-quality offspring with desirable traits by examining large datasets of genetic and phenotypic data [21]. ...
Article
Full-text available
Thermal imaging is increasingly used in poultry, swine, and dairy animal husbandry to detect disease and distress. In intensive pig production systems, early detection of health and welfare issues is crucial for timely intervention. Using thermal imaging for pig treatment classification can improve animal welfare and promote sustainable pig production. In this paper, we present a depthwise separable inception subnetwork (DISubNet), a lightweight model for classifying four pig treatments. Based on the modified model architecture, we propose two DISubNet versions: DISubNetV1 and DISubNetV2. Our proposed models are compared to other deep learning models commonly employed for image classification. The thermal dataset captured by a forward-looking infrared (FLIR) camera is used to train these models. The experimental results demonstrate that the proposed models for thermal images of various pig treatments outperform other models. In addition, both proposed models achieve approximately 99.96–99.98% classification accuracy with fewer parameters.
... 10 ML has also been gradually applied in the study of the economic traits of pigs. There are studies that have used ML to predict daily gain 11 and total number born 12 of pigs, which showed high accuracy. However, due to the relatively high cost and complex processing requirements of RNA sequencing, ML also faces the problem of a small sample size. ...
Article
Full-text available
Fat deposition in pigs is not only closely related to pig production efficiency and pork quality but also an ideal model for human obesity. Transcriptome sequencing is widely used to study fat deposition. However, due to small sample sizes, high false positive rates, and poor consistency of results from different studies, new strategies are urgently needed. Machine learning, a new analysis method, can effectively fit complex data and accurately identify samples and genes. In this study, 36 samples of adipose tissue, muscle tissue, and liver tissue were collected from Songliao black pigs and Landrace pigs, and the mRNA of all the samples was sequenced. In addition, we collected transcriptome data for 64 samples in the GEO database from four different sources. After standardization and imputation of missing values in the data set comprising 100 samples, traditional differential expression analysis was carried out, and different numbers of expressed genes were selected as features for the training model of eight machine learning methods. In the 1000 replications of fourfold cross validation with 100 samples, AdaBoost performed best, with an average prediction accuracy greater than 93% and the highest mean area under the curve in predicting the high- and low-fat content groups among the eight ML methods. According to their performance-based ranks inferred by AdaBoost, 12 genes related to fat deposition were identified; among them, FASN and APOD were specifically expressed in adipose tissue, and APOA1 was specifically expressed in the liver, which could be important candidate biomarkers affecting fat deposition.
... The SVM method is a popular machine learning algorithm used in genome-enabled prediction due to its capability to handle potential nonlinearity between features and target traits in both animals and plants [17,30,31]. Previous studies have shown contrasting results regarding the predictive performance of SVM over linear models [17,29,32,33]. In this study, the predictive correlation of the SVM model ranked second in two traits, and the difference in performance with the GBLUP model was small for all traits (Figure 1). ...
... They reported that the prediction accuracy was very similar among methods. Meanwhile, Tusell et al. [33] showed that the SVM models could outperform the conventional GBLUP in predicting average residual feed intake and average daily gain crossbred performances from purebred sire genotypes. Among machine learning methods, only the boosting method XGB outperformed GBLUP for some traits (CWT and MS) in terms of predictive correlation, as shown in Figure 1. ...
Article
Full-text available
Hanwoo was originally raised for draft purposes, but the increase in local demand for red meat turned that purpose into full-scale meat-type cattle rearing; it is now considered one of the most economically important species and a vital food source for Koreans. The application of genomic selection in Hanwoo breeding programs in recent years was expected to lead to higher genetic progress. However, better statistical methods that can improve the genomic prediction accuracy are required. Hence, this study aimed to compare the predictive performance of three machine learning methods, namely, random forest (RF), extreme gradient boosting method (XGB), and support vector machine (SVM), when predicting the carcass weight (CWT), marbling score (MS), backfat thickness (BFT) and eye muscle area (EMA). Phenotypic and genotypic data (53,866 SNPs) from 7324 commercial Hanwoo cattle that were slaughtered at the age of around 30 months were used. The results showed that the boosting method XGB showed the highest predictive correlation for CWT and MS, followed by GBLUP, SVM, and RF. Meanwhile, the best predictive correlation for BFT and EMA was delivered by GBLUP, followed by SVM, RF, and XGB. Although XGB presented the highest predictive correlations for some traits, we did not find an advantage of XGB or any machine learning methods over GBLUP according to the mean squared error of prediction. Thus, we still recommend the use of GBLUP in the prediction of genomic breeding values for carcass traits in Hanwoo cattle.
... Finally, in an empirical study of pigs, Tusell et al. (2020) compared the use of 5,137 PB phenotypes and genotypes in the reference population, with the use of DEBV and genotypes of 205 PB sires in the reference population. The DEBV of these sires were computed from phenotypes of their 2,774 CB offspring. ...
... Finally, in another study of pigs, Tusell et al. (2020) compared a scenario where 3059 CB phenotypes were used to estimate DEBV of genotyped PB animals in the reference population, with the use of a reference population of 3998 CB animals. Their results showed that, using either an additive genomic prediction model or an SVM, genotyping CB animals resulted in a higher accuracy for both ADG (increase of 0.05-0.16), ...
... Their results showed that the accuracy of genomic prediction was 0.10 lower with a CB compared to a PB reference population for a trait with an of 0.96, and that they were equal for a trait with an of 0.80. Finally, Tusell et al. (2020) studied the difference in accuracy between strategies that either used a reference population of 3209 PB or 3998 CB pigs. GEBV were estimated with either an additive genomic prediction model or a support vector machine (SVM), and accuracies were obtained with either random cross-validation or validation in the two youngest generations. ...
Article
Full-text available
Breeding programs aiming to improve the performance of crossbreds may benefit from genomic prediction of crossbred (CB) performance for purebred (PB) selection candidates. In this review, we compared genomic prediction strategies that differed in (1) the genomic prediction model used, or (2) the data used in the reference population. We found 27 unique studies, two of which used deterministic simulation, 11 used stochastic simulation, and 14 real data. Differences in accuracy and response to selection between strategies depended on i) the value of the purebred crossbred genetic correlation (rpc), ii) the genetic distance between the parental lines, iii) the size of PB and CB reference populations, and iv) the relatedness of these reference populations to the selection candidates. In studies where a PB reference population was used, the use of a dominance model yielded accuracies that were equal to or higher than those of additive models. When rpc was lower than ~0.8, and was caused mainly by GxE, it was beneficial to create a reference population of PB animals that are tested in a CB environment. In general, the benefit of collecting CB information increased with decreasing rpc. For a given rpc, the benefit of collecting CB information increased with increasing size of the reference populations. Collecting CB information was not beneficial when rpc was higher than ~0.9, especially when the reference populations were small. Collecting only phenotypes of CB animals may slightly improve accuracy and response to selection, but requires that the pedigree is known. It is therefore advisable to genotype these CB animals as well. Finally, considering the breed-origin of alleles allows for modelling breed-specific effects in the CB, but this did not always lead to higher accuracies. Our review shows that the differences in accuracy and response to selection between strategies depend on several factors. One of the most important factors is rpc, and we therefore recommend to obtain accurate estimates of rpc of all breeding goal traits. Furthermore, knowledge about the importance of components of rpc (i.e., dominance, epistasis, and GxE) can help breeders to decide which model to use, and whether to collect data on animals in a CB environment. Future research should focus on the development of a tool that predicts accuracy and response to selection from scenario specific parameters.
Chapter
Swine production is important to meeting the needs of a growing human population that consumes greater quantities of animal-derived protein with increasing wealth in certain nations. Common management practices implemented throughout the swine production continuum impact the development of the native gastrointestinal microbial consortia. The swine gut microbial population impacts host animal physiology and health through a series of complex interactions catalyzed by early environmental exposure, the colonization of a complex gut microbial ecosystem, immune system development in which responses are orchestrated to promote whole-organism homeostasis, and, ultimately, the efficient production of high-quality pork. We discuss recent changes in our understanding of how direct-fed microbial supplementation benefits swine and pork producers. Further, we have increased our recognition of how much information we lack on understanding how the native microbial population impacts growth performance and how direct-fed microbial supplements can be used as tools for reaching swine’s genetic potential for efficient production. Specifically, our challenge is to identify bacterial strains that elicit defined responses in the host such that the future of direct-fed microbial (eubiotic) supplementation to swine diets will be tailored to provide the most benefit to the pig at various production stages or administered as a prescription to address the specific needs of a complex global swine production system.
Article
Stayability (STAY) is a binary trait with significant value economically. It measures both the cow`s reproductive performance and longevity simultaneously. Thus, STAY is one of the most important female selection criterion in Nellore beef cattle breeding programs. The “success” for STAY is defined as the ability of a cow to stay in the herd up to 76 months of age and to have at least three calve. Despite its importance, STAY has not been investigated under a machine learning (ML) framework, which might allow to intuitively capture linear and nonlinear relationships (e.g., non-additive effects) between a response variable and other predictor variables. In this study, we compared different ML tools using a genome-enabled approach to classify daughters (non-genotyped animals but with STAY records) of genotyped sires. In total, 44,626 STAY records from daughters of 559 bulls genotyped with the 777K SNP panel were available for this study. The genotyped data were subdivided into three SNP sets based on the top-ranked effect on STAY: 1K-, 3K-, and 5K-SNP panels. The following ML algorithms were evaluated: AdaBoost (ADA), Naïve Bayes (NB), Decision Tree (DT), Deep Neural Network (DNN), k-Nearest Neighbors (NN), Multi-Layer Perceptron Neural Network (MLP), and Support Vector Machine (SVM). The analyses were performed using free Scikit-learn for the Python programming language. No relevant improvements in the learning process of the evaluated algorithms were observed when the number of SNPs in the genotype dataset was increased (i.e., 1K-, 3K-, or 5K-SNP panel). In short, NB outperformed the other algorithms considering, for example, the balanced accuracy (0.62 ± 0.01) and sensitivity (0.56 ± 0.02) metrics. In conclusion, the use of the 1K-SNP panel allowed efficient genomic classification and the NB algorithm outperformed the other methods as indicated by various classification metrics. To best of our knowledge, this is the first study using ML and genome-enabled classification of STAY in beef cattle.