Content uploaded by Shantanu Das
Author content
All content in this area was uploaded by Shantanu Das on Aug 15, 2018
Content may be subject to copyright.
1
Biomolecule Reports- An International eNewsletter BR/09/17/03
Principal Component Analysis in Plant Breeding
Shantanu Das1,*, Soumitra Sankar Das2, Indrani Chakraborty3, Nabarun Roy4,
Mallar Kanti Nath5 and Debojit Sarma6
1, 6Department of Plant Breeding and Genetics, Assam Agricultural University, Jorhat
2Department of Agricultural Statistics, Uttar Banga Krishi Viswavidyalaya, Cooch Behar
3Department of Plant Breeding and Genetics, Chaudhary Charan Singh Haryana Agricultural
University, Hisar
4,5Department of Agricultural Biotechnology, Assam Agricultural University, Jorhat
*E-mail: shanubrdr.oryza@gmail.com
One of the important approaches to plant breeding is hybridization followed
by selection. Appropriate parents selection is essential to be used in crossing nurseries to
enhance the genetic recombination for potential yield increase (Islam, 2004). Thus, study of
many morphological characters in germplasm is important for assessment of the differences
among populations as well as for assessment of their breeding potential. A large number of
variables are often measured by plant breeders, some of which may not be of sufficient
discriminatory power for germplasm evaluation, characterization and management (Maji and
Shaibu, 2012). In such case, principal component analysis (PCA) may be used to reveal
patterns and eliminate redundancy in data sets (Adams, 1995; Amy and Pritts, 1991) as
morphological and physiological variations routinely occur in crop species. Hotelling (1933)
indicated that PCA is an exploratory tool to identify unknown trends in a multidimensional
data set.
The PCA or canonical root analysis is a multivariate statistical technique
attempt to simplify and analyze the inter relationship among a large set of variables in term of
a relatively a small set of variables or components without losing any essential information of
original data set. The PCA reduce relatively a large series of data into smaller number of
components by looking for groups that have very strong inter-correlation in a set of variables
and each component explained per cent (%) variation to the total variability. The first
principal component is the largest contributor to the total variation in the population followed
by subsequent components. The criteria used by Clifford and Stephenson (1975) and
corroborated by Guei et al. (2005), suggested that the first three principal components are
often the most important in reflecting the variation patterns among accessions, and the
Biomolecule Reports ISSN:2456-8759
Popular Article
Das et al., BR/09/17/03
2
Biomolecule Reports- An International eNewsletter BR/09/17/03
characters associated with these are more useful in differentiating the accessions. Thus it is
useful for genetic improvement of important traits having larger contributions to the
variability rather than going for all the characters under study.
In PCA data reduction technique constitute (1) extract the most important
information from the data table, (2) compress the size of the data set by keeping only the
important information, (3) simplify the description of the data set; and (4) analyze the
structure of the observations and the variables. Often, only the important information needs to
be extracted from a data matrix. In this case, the problem is to figure out how many
components need to be considered. This problem can be overcome by using some guideline.
A first procedure is to plot the eigenvalues according to their size and to see if there is a point
in this graph (‘elbow’) such that the slope of the graph goes from ‘steep’ to ‘‘flat’’ and to
keep only the components which are before the elbow. This procedure is called the scree or
elbow test (Jolliffe, 2002 and Cattell, 1966). Another standard tradition is to keep only the
components whose eigenvalue is larger than the average. For a correlation PCA, the standard
advice to ‘keep only the eigenvalues larger than 1 (Kaiser, 1961). However, this procedure
can lead to ignoring important information (O’Toole et al., 1993). Another methodology
include the amount of total variance explained (i.e. >80%) by the principle components
(Johnson and Wichern, 1992). Likewise, most important components can be extracted for
interpreting the result.
Since PCA extract all the important components and highlight their
contribution toward the total variability, it can be the choice as an important tool to speed up
the breeding programme.
Reference:
Adams, M.W. (1995). An estimate of homogeneity in crop plants with special reference to
genetic vulnerability in dry season. Phseolus vulgaris. Ephytica 26: 665-679.
Amy, E.L. and Pritts, M.P. (1991). Application of principal component analysis to
horticultural research. Hort Sci. 26(4): 334-338.
Cattell, R.B. (1966). The scree test for the number of factors. Multivariate Behav. Res. 1:
245-276.
Clifford, H.T. and Stephenson, W. (1975). An Introduction to Numerical Classification.
Academic Press, London. p. 229.
3
Biomolecule Reports- An International eNewsletter BR/09/17/03
Guei, R.G.; Sanni, K.A. and Fawole, A.F.J. (2005). Genetic diversity of rice (O. sativa L.).
Agron. Afr. 5: 17-28.
Hotelling, H., (1933). Analysis of a complex of statistical variable into principal components.
J. Educ. Psych., 24: 417-441.
Islam, M.R. (2004). Genetic diversity in irrigated rice. Pakistan J. Biol. Sci. 2: 226-229
Johnson, R.A. and Wichern, D.W. (1992). Applied multivariate statistical analysis.Prentice-
Hall, Inc.
Jolliffe, I.T. (2002). Principal Component Analysis. New York: Springer.
Kaiser, H.F. (1961). A note on Guttman’s lower bound for the number of common factors.
Br. J. Math. Stat. Psychol. 14:1-2.
Maji, A.T. and Shaibu, A.A. (2012). Application of principal component analysis for rice
germplasm characterization and evaluation. J. Plant Breed. Crop Sci. 4(6): 87-
93.
O’Toole, A.J., Abdi, H., Deffenbacher, K.A. and Valentin, D. (1993). A low dimensional
representation of faces in the higher dimensions of the space. J. Opt. Soc. Am.
[Ser A] 10:405-411.