ArticlePDF Available

Abstract and Figures

Yield prediction is a very important agricultural problem. Any farmer is interested in knowing how much yield he is about to expect. In the past, yield prediction was performed by considering farmer's experience on particular field and crop. Based on previous data, we can predict crop yield using machine-learning technique. Crop yield prediction is an important area of research, which helps in ensuring food security all around the world. We analyzed result of multiple linear Regression, Regression Tree, K-nearest Neighbor and Artificial Neural Network on Groundnut data of previous 8 years. We have done prediction based on Soil, Environmental and Abiotic attributes. KNN algorithm gives better result compared to other algorithms for Groundnut crop yield prediction.
Content may be subject to copyright.
CSEIT1835254 | Received : 15 June 2018 | Accepted : 26 June 2018 | May-June-2018 [ (3)5 : 1093-1097 ]
International Journal of Scientific Research in Computer Science, Engineering and Information Technology
© 2018 IJSRCSEIT | Volume 3 | Issue 5 | ISSN : 2456-3307
1093
Groundnut Crop Yield Prediction Using Machine Learning
Techniques
Vinita Shah*1, Prachi Shah2
*1Assistant Professor, Information Technology, G H Patel College of Engineering and Technology, V V Nagar,
Gujarat, India
2Assistant Professor, Department of Information Technology, BVM Engineering College, V V Nagar, Gujarat,
India
ABSTRACT
Yield prediction is a very important agricultural problem. Any farmer is interested in knowing how much yield
he is about to expect. In the past, yield prediction was performed by considering farmer's experience on
particular field and crop. Based on previous data, we can predict crop yield using machine-learning technique.
Crop yield prediction is an important area of research, which helps in ensuring food security all around the
world. We analyzed result of multiple linear Regression, Regression Tree, K-nearest Neighbor and Artificial
Neural Network on Groundnut data of previous 8 years. We have done prediction based on Soil, Environmental
and Abiotic attributes. KNN algorithm gives better result compared to other algorithms for Groundnut crop
yield prediction.
Keywords : Crop analysis; Yield prediction; K-means; K-NN; Multiple Linear regression
I. INTRODUCTION
Data mining is the process of extracting useful
knowledge or information from large amount of data.
In digital generation, data mining is becoming an
increasingly important tool to transform data into
information. When the Data mining techniques are
used with agriculture data, the term is known as
precision agriculture. The main aim of the work is to
improve and substantiate the validity of yield
prediction, which is useful for the farmers.
Agricultural crop production depends on various
factors such as biology, climate, economy and
geography. Several factors have different impacts on
agriculture. So previous year‟s researchers used
appropriate statistical methodologies. A large number
of variables can affect agronomic traits such as yield.
Yield prediction is a very important agricultural
problem. Any farmer is interested in knowing how
much yield he is about to expect. In the past, yield
prediction was performed by considering farmer's
experience on particular field and crop. Consider that
data are available for some time back to the past,
where the corresponding yield predictions have been
recorded. In any of Data Mining procedures, / the
training data is to be collected from some time back
to the past and the gathered data is used in terms of
training which has to be exploited to learn how to
classify future yield predictions. Crop yield
prediction is an important area of research, which
helps in ensuring food security all around the world.
In our research, we have considered the effects of
environmental (weather), biotic (pH, soil salinity)
and area of production as factors towards crop
production in Gujarat. Considering these factors as
Volume 3, Issue 5, May-June-2018 | http:// ijsrcseit.com
Vinita Shah et al. Int J S Res CSE & IT. 2018 May-June; 3(5) : 1093-1097
1094
datasets for various districts, we proposed a new
approach that divides the yearly dataset in to
seasonally basis. Then we applied some clustering
techniques and suitable classification techniques to
obtain crop yield predictions.
It is explained as follows: Section 1 introduces the
area. Section 2 provides some related work in Crop
yield prediction. Section 3 explains motivations for
the work. Section 4 presents the Real dataset use for
our work. Section 5 presents methodology of the
work. Section 6 gives the analysis of proposed
approach.
II. MOTIVATION
Gujarat has been described by hot semi-dry
conditions, which has 49% of the aggregate
topographical range as developed grounds (9.6
million ha). The inundated zone is just 32% of the
aggregate developed territory. Rest of the range
confronts vast water deficiency. The ground waters
are of low quality. The alluvial soils are basic/saline
and lacking in nitrogen, phosphorus and zinc. The
editing force is low at 118%. The aggregate
sustenance grains creation of the state is 5.26 million
tons. The critical yields are groundnut, cotton, pearl
millet, maize, sorghum, castor, gram and mustard.
A farmer must have a good understanding of the soil
type, the biotic factors governing the soil and a
thorough knowledge about the traditional
agricultural practices to gain maximum crop yield.
Such practice may include harrowing and ploughing
using inputs such as fertilizers, insecticides and
herbicides [7].
It would be seen from the Fig.1 that though Gujarat
stands first in area and production but productivity
wise Tamil Nadu state stands first in the country.
However, Gujarat state groundnut production
depends on September rains. In case September rains
are not there in Gujarat, the production and
productivity will go down automatically. Therefore,
it is important to find the relation between climate
variable, rainfall and yield data.
Fig. 1.Statewise Area Production and production of
Groundnut [14]
III. PREPROCESSING
A. Dataset
In this research, two types of datasets are used:
Weather data and Yield data. The Weather dataset
used in this research has been collected from
Department of Metrological Centre, Anand
Agriculture University (AAU), Anand and Yield data
are taken from 4 different Yearbooks of Directorate
of Agriculture Gujarat State, Gandhinagar for four
different district of Gujarat which are Jamnagar,
Junagath, Rajkot, Amreli. A lot of pre-processing was
required to handle missing values, noise and outliers.
We considered different 20 attributes for this
research: rainfall, maximum and minimum
temperature, Vapour Pressure, Relative humidity,
Basic Sunshine, Evaporation Pressure, Soil
Temperature at different depth, Wind Speed,
irrigated area for all districts; and cultivated area for
crop yield considered according to the districts.
After the necessary formatting and pre-processing of
Volume 3, Issue 5, May-June-2018 | http:// ijsrcseit.com
Vinita Shah et al. Int J S Res CSE & IT. 2018 May-June; 3(5) : 1093-1097
1095
The datasets, the finalized version of our data
contains
Total 4 districts for the time periods of 2006 to 2013.
B. Input Variables and Measures
From the vast initial dataset, we selected a limited
number of important input variables that is 20, which
have the highest contribution to agricultural
production. All the inputs were considered for the
eight-year periods of 2006 to 2013.
There are various measures to different prediction
models:
1. RSquare [Coefficient of determination] is
simply the square of the sample correlation
coefficient (i.e., r) between the outcomes and
their predicted values. The coefficient of
determination ranges from 0 to 1.
2. RMSE [Root Mean Square Error] is a
frequently used measure of the difference
between values predicted by a model and the
values actually observed from the
environment that is being modeled.
IV. METHODOLOGY
A. Proposed Flow
The existing papers involves less number of attributes
of Climate data. Instead of that, we will use more
number of attributes including soil attributes which
had not been considered in previous work. Also,
instead of yearly approach, we have done on
seasonally basis approach to get more accurate results.
For seasonally basis approach, we divided total
dataset in to two seasons 1.Kharif And 2.Summer as
groundnut production are more in this two seasons.
For that purpose, we created GUI in java which
divides monthly data sets into season wise cluster
data and also combine the associated yield dataset of
that particular season for all 4 districts. Selection is
done by season wise and generates the dataset
according to its climatic attributes average and
fetches its yield from dataset and then generates
result for each year. So basically, it combines 2
datasets and also divides them into season wise of
kharif and summer and displays the output results in
GUI of each district.
B. Individual effect of Attributes on yield
To find the individual attribute effect on yield
correlation is perform on the data. In addition, values
of „r‟ and “R square” are generated. From that, we can
see that which attribute affect more on yield.
From the values of R Square, it clearly sees that above
values of 0.5 depends more on yield and are
important attribute to grow of plant so we have
considered it.
C. Combined effect of Attributes on yield
We have taken 4 different alternatives from which
the different models accuracy measures can be found.
A. Environmental Attributes:
Rainfall, Maximum Temperature, Minimum
Temperature, Basic Sunshine, Relative humidity(2
times), Vapor Pressure(2 times), Evaporation Pressure
B. Soil Attributes:
Temperature from the different depth of Soil (6)
C. Abiotic Attributes:
Water Content (2), Density (2), Wind Speed
D. Area Central Attributes:
Total area from which the yield is produced and total
production
The following classification/regression models were
used to obtain the crop yield prediction results:
Volume 3, Issue 5, May-June-2018 | http:// ijsrcseit.com
Vinita Shah et al. Int J S Res CSE & IT. 2018 May-June; 3(5) : 1093-1097
1096
a) Linear Regression: It is a statistical measure that
can be used to determine the strength of the
relationship between one dependent variable and a
series of other changing variables known as
independent variables (regular attributes). If
independent variable contains multiple input
attributes like in our research (rainfall, sunshine
hours, humidity, pH etc), then it is termed as
multiple linear regressions. Linear regression
provides a model for the relationship between a
scalar variable and one or more explanatory variables.
This is done by fitting a linear equation to the
observed data [7].
b) k-NN: The k-nearest neighbor algorithm compares
a given test example with training examples which
are similar. Each example denotes a point in an
ndimensional space. Thus, all of the training
examples are saved in an n-dimensional pattern space.
K is a positive integer, usually small. For our purpose,
the basic k-NN algorithm was applied. It first finds
the k examples from the training set that are closest
to the unknown example. Then it takes the most
common occurring classification for the k examples
[7].
c) Neural Network: An artificial neural network
(ANN) is a mathematical model or computational
model inspired by the structure and functional
aspects of biological neural networks for instance in
our brains. In most cases an ANN is an adaptive
system that modifies its structure based on external
or internal information that flows through the
network during the learning phase. The basic neural
network model consists of three layers: the input
layer, the hidden layer and an output layer [7].
d) Regression Tree: This are commonly used in data
mining with the objective of creating a model that
predicts the value of a target (or dependent variable)
based on the values of several input (or independent
variables).In Regression Trees target variable is
continuous and tree is used to predict its value.
V. RESULT ANALYSIS
In our research, we determined prediction results for
yields of Groundnut crop for the selected districts in
Gujarat. The predictions results were obtained
according to the selected input attributes using
appropriate classification and regression models.
The tool used to provide predictive algorithms,
include powerful methods to explore, transform and
“clean” data, partition data into “training” and
“validation” sets, train one or more machine
learning models, and evaluate and compare the
performance of those models.
For prediction, four different algorithms are used.
Multiple linear Regression
Regression Tree
K-nearest Neighbor
Artificial Neural Network
Fig. 2. Comparison of prediction algorithms
RMSE values are compared of four different methods
for all four different alternatives which are discussed
in figure 2.
Fig 3.Comparative Analysis of four-prediction model
with four approaches
Volume 3, Issue 5, May-June-2018 | http:// ijsrcseit.com
Vinita Shah et al. Int J S Res CSE & IT. 2018 May-June; 3(5) : 1093-1097
1097
VI. CONCLUSION AND FUTURE WORK
Weather data has a major role in affecting the
Agriculture data of crop yield. For ground nut crop
yield prediction, K-nearest neighbor for prediction
model gives better Results in training as well as
validation for the data then other three techniques
which are Artificial Neural network and two
regression techniques multiple linear regression(MLR)
and Regression tree. Apart from that, if using larger
dataset of previous years can also be affect the
accuracy of algorithms and can get better accuracy or
less RMSE value in future.
VII. REFERENCES
[1] MucherinoPetraqPapajorgji, P. M. Pardalos,”A
survey of data mining techniques applied to
agriculture”, Springer, 2009.
[2] S.S.Bhaskar, L.Arokiam, V.Arul Kumar,
L.Jeyassimaan,”A Brief Survey Of Data Mining
Techniques To agriculture Applications”,
MadwellJournels, 2010.
[3] Ramesh A. Medar,Vijay. S. Rajpurohit,”A
survey on Data Mining Techniques for Crop
Yield Prediction”,IJARCSMS, 2014
[4] Hui Chen,WeiWu,Hong-Bin Liu,“Assessing the
relative importance of climate variables to rice
yield variation using support vector
machines”,Springer,2015.
[5] D Ramesh, B Vishnu Vardhan,”Analysis Of
Crop Yield Prediction Using Data Mining
Techniques”, IJRET, 2015.
[6] D Ramesh, B Vishnu Vardhan, “Crop Yield
Prediction Using Weight Based Clustering
Technique “, IJCEA, 2015.
[7] A.T.M ShakilAhamed, NavidTanzeem
Mahmood, Nazmul Hossain, Mohammad
TanzirKabir, Kallal Das, Faridur Rahman,
Rashedur M Rahman,”Applying Data Mining
Techniques to Predict Annual Yield of Major
Crops and Recommend PlantingDifferent Crops
in Different Districts in Bangladesh”,IEEE,2015
[8] D. Ramesh,B Vishnu Vardhan, O
SubhashChanderGoud,” Density Based
Clustering Technique on Crop Yield
Prediction”IJEEE,2014
[9] Mohammad MotiurRahman,NaheenaHaq,
Rashedur M Rahman,”Application of Data
Mining Tools for Rice Yield Prediction on
Clustered Regions of Bangladesh”,IEEE,2014
[10] D Ramesh, B Vishnu Vardhan,”Data Mining
Techniques and Applications to Agricultural
Yield Data”, IJARCCE, 2013
[11] José R. Romero , Pablo F. Roncallo , Pavan C.
Akkiraju , Ignacio Ponzoni , Viviana C.
Echenique, Jessica A. Carballido,”Using
classification algorithms for predicting durum
wheat yield in the province of Buenos Aires”,
ELSVIER, 2013.
[12] YunousVagh, JitianXiao, "A data mining
perspective of dual effect of Rainfall and
Temperature on Wheat Yield", ECU, 2012
[13] Parekh, F. P, Suryanarayana, T. M. V ,”Impact
of Climatological Parameters on Yield of Wheat
Using Neural Network Fitting”,International
Journal of Modern Engineering Research ,2012
[14] Ye, Nong; Data Mining: Theories, Algorithms,
and Examples, CRC Press, 2013.
[15] Data mining Concepts and Techniques; Jiawei
Han and MichelineKamber; Second Edition,
Morgan kaufmann publishers.
[16] https://dag.gujarat.gov.in/images/directorofagric
ulture/pdf/Groundnut-Book.pdf
... The authors Kumar and Sreenivasulu (2017) used remote sensing images of the Chitoor district, Andhra Pradesh, in the regression model to estimate the groundnut yield. The authors Shah and Shah (2018) included the soil, rainfall and weather parameters of Gujarat state in various models and achieved the best results using the K nearest neighbour model. ...
... The authors used the weather and soil data of Andhra Pradesh in the random forest classifier model only. The authors Shah and Shah (2018) included the soil, rainfall and weather parameters of Gujarat state in various models and achieved the best result using the K nearest neighbour model. There is no study on the groundnut crop yield prediction using the data from the Tamil Nadu districts and a comparative analysis of the model performances. ...
Article
Full-text available
Tamil Nadu ranks high in groundnut production in India. The yield prediction of the crop over Tamil Nadu will be highly useful in improving the efficiency of the production. This article aims to identify an efficient machine learning model to predict the groundnut crop yield and analyse the performance of the tested models. The study used the irrigation, rainfall, area and production data as factors for the groundnut crop yield across the districts of Tamil Nadu. This article identified the best set of features for training the models and studied various prediction models to evaluate the performance on the collected data. The trained and tested data were evaluated using various performance measures. The results of the study show that LASSO and ElasticNet provide the optimal results with the lowest RMSE and RRMSE values of 491.603 and 490.931 kg·ha<sup>–1</sup>, 20.68 and 20.66%, respectively. The models showed the lowest MAE and RMAE values as well (333.154 and 331.827 kg·ha<sup>–1</sup> and 14.53%, 14.51%, respectively) when compared to other models. The identification of the right time to sow and area to irrigate through feature selection and the prediction of the yield will improve the yield of the groundnut crops. This helps farmers to make practical decisions and reap the benefits.
... The lower performance of RF can be attributed to overfitting as evidenced by a significant decline in R 2 and increase in RMSE when comparing training and testing. Other studies have also reported the potential of KNN regression for estimating crop variables in groundnut [9] and corn [10]. The MLR prediction models were as good as the best performing machine learning models. ...
Conference Paper
This study aimed to evaluate the performance of an unmanned aerial vehicle based remote sensing to quantify Bambara groundnut crop state variables. During Malaysia’s 2018/19 Bambara growing season, remotely sensed colour infrared images and in-situ crop state variables were acquired at vegetative, flowering, podding, podfilling, maturity and senescence phenological stages. Five common vegetation indices (VIs) were calculated from the images resulting in single stage and cumulative VIs (ΣVIs). Correlation analyses were used to examine the relationship between crop variables and single stage VIs/ΣVIs. Linear and machine learning (ML) regression were used to estimate crop variables using the VIs/ΣVIs as input. The best single-stage correlations were observed at flowering. ΣVIs from vegetative to senescence had strongest correlations with crop variables. K-Nearest Neighbour regression outperformed other ML regression in estimating crop variables. However, multiple linear regression were as good as the best performing ML algorithms in estimating groundnut crop variables. Index Terms—UAV; modified digital camera; precision agriculture; low-cost technology, crop variables
... Random forest, linear regression and feature selection algorithm in regression analysis are widely used to discover useful prediction knowledge from the experimental database. Crop yield estimation with regression analysis is a newest topic in literature, and was considered for different crops like a rice (Baby et al., 2021), groundnut (Shah and Shah, 2018) and wheat (Hunt et al., 2019). In this study, two regression models namely random forest and linear regression were upto build the most accurate and effective models since the learning information occurs with required outputs and also the objective of the study was to determine a common rule of showing input to output. ...
Article
Full-text available
Internet has added a new aspect to our existence by placing within easy reach huge variety of information. Internet gives each of us the choice to be a publisher of our data, information and views. The Internet offers prosperity of agricultural business opportunities. Besides, to achieve another main objective to study an attitude of research scholars towards internet exposure using random forest and linear regression models. Random forest model was better with prediction accuracy of 87 % (R 2), lowest MAE of 1.71 and RMSE of 2.29 as compared with linear regression model. It was further noticed that, the actual and predicted attitude towards internet exposure are closed to each other. The regression analysis suggested that input
... Arun Kumar et al. [21] utilized ANN for regression analysis to anticipate crop yields dependent on yield efficiency. Authors in [22] used time series forecasting methods to analyze and predict the weather patterns. Factors such as environmental, soil, weather, and abiotic features are adopted in [23] to classify and foresee the groundnut yield using Random Forest, SVM, and KNN. Authors in [24] accentuate the utilization of a minimal expense UAV framework with a vision-based arrangement for the isolation of fundamental harvests from weed. ...
Preprint
Full-text available
Precision agriculture system is an arising idea that refers to overseeing farms utilizing current information and communication technologies to improve the quantity and quality of yields while advancing the human work required. The automation requires the assortment of information given by the sensors such as soil, water, light, humidity, temperature for additional information to furnish the operator with exact data to acquire excellent yield to farmers. In this work, a study is proposed that incorporates all common state-of-the-art approaches for precision agriculture use. Technologies like the Internet of Things (IoT) for data collection, machine Learning for crop damage prediction, and deep learning for crop disease detection is used. The data collection using IoT is responsible for the measure of moisture levels for smart irrigation, n, p, k estimations of fertilizers for best yield development. For crop damage prediction, various algorithms like Random Forest (RF), Light gradient boosting machine (LGBM), XGBoost (XGB), Decision Tree (DT) and K Nearest Neighbor (KNN) are used. Subsequently, Pre-Trained Convolutional Neural Network (CNN) models such as VGG16, Resnet50, and DenseNet121 are also trained to check if the crop was tainted with some illness or not.
Thesis
Full-text available
Agriculture production is the economic and social backbone of the world, providing food and other products that sustain human life and support economic development. However, challenges such as increasing population growth, climate change, and industrialization can impact crop yield. Crop yield plays a crucial factor in the success of agricultural production, it’s the process of estimating the number of crops that will be produced in each season or location. Many approaches can be taken to improve crop yield, in this study, I developed machine learning algorithms to improve crop yield prediction by considering the individual factors that contribute to crop yield, including weather patterns, soil conditions, weeds and pest infestations, and decisions on what type of crops to grow. By improving the accuracy of these factors, I aim to improve crop yield predictions. To achieve this, I developed five modules that contribute to crop yield prediction: a crop recommendation module, a weather prediction module, a fertilizer recommendation module, a plant disease identification module, and a weed detection module. These modules were integrated into a user-friendly web interface that farmers could use to input data and receive predictions based on the modules. The performance of the algorithms was measured using accuracy, and they all achieved high accuracy between 53% and 100%. This study also analyzed the impact of climate change on agriculture and climate-smart agricultural practices. Overall, the development of machine learning algorithms for improving crop yield prediction represents a significant step forward in addressing the challenges facing agriculture and supporting the economic and social development of communities around the world. Adopting these systems would help improve crop yield predictions, help farmers make informed decisions about their farming operations, reduce GHG emissions in the agricultural sector, and support the adoption of sustainable farming practices.
Conference Paper
Full-text available
Precision agriculture system is an arising idea that refers to overseeing farms utilizing current information and communication technologies to improve the quantity and quality of yields while advancing the human work required. The automation requires the assortment of information given by the sensors such as soil, water, light, humidity, temperature for additional information to furnish the operator with exact data to acquire excellent yield to farmers. In this work, a study is proposed that incorporates all common state-of-the-art approaches for precision agriculture use. Technologies like the Internet of Things (IoT) for data collection, machine Learning for crop damage prediction, and deep learning for crop disease detection is used. The data collection using IoT is responsible for the measure of moisture levels for smart irrigation, n, p, k estimations of fertilizers for best yield development. For crop damage prediction, various algorithms like Random Forest (RF), Light gradient boosting machine (LGBM), XGBoost (XGB), Decision Tree (DT) and K Nearest Neighbor (KNN) are used. Subsequently, Pre-Trained Convolutional Neural Network (CNN) models such as VGG16, Resnet50, and DenseNet121 are also trained to check if the crop was tainted with some illness or not.
Article
Full-text available
Wheat is one of the most important cereals worldwide for human nutrition. Tetraploid wheat (Triticum turgidum L. ssp. durum, 2n = 28, genomes AABB) is mainly used to produce pasta. The main objective of durum wheat breeding programs is to develop varieties with good quality and high yields. Yield is a very complex trait, and depends on different yield components that are genetically controlled and affected by environmental constraints. In this context, machine learning constitutes an excellent alternative for the analysis of a high number of traits in order to extract the most relevant ones as confident predictors of the performance of this crop, allowing a better agricultural planning. Thus, we propose the use of machine learning algorithms for the classification of yield components and for the search of new rules to infer high yields at harvest of durum wheat. The main objective of this work was to obtain rules for predicting durum wheat yield through different machine learning algorithms, and compare them to detect the one that best fits the model. In order to achieve this goal, One-R, J48, Ibk and A priori algorithms were run with data collected by our research group of a RIL (recombinant inbreed lines) population growing in six different environments from the Province of Buenos Aires in Argentina. The results indicate that the A priori method obtains the best performance for all locations, and the classificators generated using the different algorithms share a common set of selected traits. Moreover, comparing these results with the previous ones obtained using different techniques, mainly QTL mapping, the traits indicated to be the most significant ones were the same. The analysis of the resulting rules shows the soundness in the agronomic relevance of the extracted knowledge.
Article
Reservation of adequate food is a major concern for many developing countries worldwide. Governments of those countries also want to meet the demand of food in the long term, especially during the period of natural calamity. For low lying region such as Bangladesh, predicting the supply of food is critical. Besides, rice productivity of Bangladesh has also changed due to varying climatic over the last couple of decades. In this paper, a comprehensive analysis has been made to forecast the rice yield of Bangladesh.
Clustering Technique on Crop Yield Prediction
Clustering Technique on Crop Yield Prediction"IJEEE,2014
A data mining perspective of dual effect of Rainfall and Temperature on Wheat Yield
  • Jitianxiao Yunousvagh
YunousVagh, JitianXiao, "A data mining perspective of dual effect of Rainfall and Temperature on Wheat Yield", ECU, 2012
Impact of Climatological Parameters on Yield of Wheat Using Neural Network Fitting
  • F P Parekh
  • T M Suryanarayana
Parekh, F. P, Suryanarayana, T. M. V,"Impact of Climatological Parameters on Yield of Wheat Using Neural Network Fitting",International Journal of Modern Engineering Research,2012