Content uploaded by Neelam Singh
Author content
All content in this area was uploaded by Neelam Singh on Dec 13, 2020
Content may be subject to copyright.
JOURNAL OF CRITICAL REVIEWS
ISSN- 2394-5125 VOL 7, ISSUE 12, 2020
4603
CROP PREDICTION METHOD TO MAXIMIZE
CROP YIELD RATE USING MACHINE LEARNING
TECHNIQUE: A CASE STUDY FOR
UTTRAKHAND REGION
Neelam Singh1, Deeksha Pant2, Devesh Pratap Singh1, Bhasker Pant1
1 Department of Computer Science and Engineering, Graphic Era Deemed to be University,
566/6 Bell Road, Clement Town, Dehradun-248002 (India)
2 Department of Computer Application, Graphic Era Deemed to be University,
566/6 Bell Road, Clement Town, Dehradun-248002 (India)
E-mail: neelamjain.jain@gmail.com, deekshapant87@gmail.com, devesh.geu@gmail.com,
pantbhaskar2@gmail.com
Received: 14 March 2020 Revised and Accepted: 8 May 2020
ABSTRACT: Agriculture has always been the backbone of India. About 40% people of India are involved in
some kind of agricultural activity. As soil is one of the most important factors in agriculture and farming, there are
many other things which are puzzled when it comes to increased net yield of the crops. There are many techniques
which are used for this purpose. In this paper, we have done a comparative analysis of various machine learning
algorithms and proposed the best fit for our dataset. Since Uttarakhand is a state which is not covered widely in
many research papers, we decided to choose our region from Uttarakhand state and took Almora district for our
study. This research paper enlightens a better and progressive way to detect crops, suitable for a particular soil
type.
Keywords- KNN Algorithm, k-star, naïve bayes, random forest.
I. INTRODUCTION
Farming in a country like India holds a severe importance because of the dependency of more than 40% of its
population. Therefore, it becomes very important for us to discover, introduce, and embrace such methods so that
the crop yield reaches its expected numbers and the farmers become smart as well as aware of the land in which
they are working. The main goal of today’s scenario is to achieve maximum yield from a lim ited land resource.
Crop selector is applicable in order to maximize the crop yield even when the conditions are unfavorable.
Increment in the production rate of crops plays a vital role in economy of a nation.
It is a matter to recognize that not only the seed selection system but also the crop selection management affects
the yield rate of crops. A lot of research has been done on this issue and the main goal described is to create such a
model that is accurate and efficient for crop classification, soil classification, crop yield prediction, etc.
This paper is divided into various sections with each section covering various contents of the paper. The first
section is the related literature that covers the mentioning of previous work done with the help of these machine
learning algorithms. The next section is the methodology covering the work done to prove which algorithm is best
fit for our dataset, followed by the results and discussion.
II. RELATED LITERATURE
India is a country of agriculture where a majority of the population relies on this occupation; it becomes more
important to find a way to increase crop yield and present better methods to select the crops.
A paper proposed a method named Crop Selection Method (CSM) to solve crop selection problem [1].In this paper
a method called Crop Selection Method is used to increase crop yield and crop selection.
Another work in the same field is prediction of rice crop by Bayesian networks [2].Now here, it is just a single
technique that is applied on the used dataset and two methods are compared on the basis of measures like MAE,
RMSE, RAE, RRSE, F1 Score, etc.
Some other machine learning algorithms like J48, Bayes Net, K-star,Random tree[3] shows a comparative study
among the mentioned algorithms. In this comparative study, Random tree is proven to be the best-fit algorithm for
the dataset.
There are many factors that are involved in the production of crops, like, area, climatic conditions, type of soil, etc.
Various subsets of these factors are used for different crops by different prediction models. These Prediction
models are deeply studied and then the results are evaluated. Now these models are broadly classified into two
JOURNAL OF CRITICAL REVIEWS
ISSN- 2394-5125 VOL 7, ISSUE 12, 2020
4604
types [4]:
1. Traditional statistics model-Traditional statistic model [5] puts together a single predictive function that
seizes entire sample space, i.e. it generates a global model over entire sample space.
2. Machine learning technique-It is a technology that came into view for knowledge mining which itself is a
hard concept to achieve statistically, by relating input and output variables [4]. In the previous model, the
structure of data model is presumed. Whereas in machine learning model, data model need not to be
presumed. This is useful characteristics for machine learning techniques to model complex nonlinear behavior
in crop yield prediction. Amongst these techniques there are still many advance techniques which are still
undiscovered in crop yield prediction. A lot of research papers have also presented various comparative
studies for these techniques. The main objective to come up with a comparative analysis is that we come
across the best suited technique for a particular dataset. All the emerging advancements in the technology
field like Iot, machine learning, etc have made this task quite easy[6].
In this research paper, our primary objective is to find the best suited technique amongst some well-known
techniques to be used for increased crop yield which in turn will help to predict the next best fit crop for the field.
This kind of prediction will help the farmer to get a nearly accurate idea of the type of crop to be grown in that
respective field. It is known that different crops need different conditions to yield better harvest; therefore the
dataset should be taken with respect to the fact that what factors will affect a crop the most. In this research paper,
a training set is prepared and a hypothetical test data is taken to do the prediction. Real data from the sensors
embedded into the field can also be used for the real approach.
KNN- Algorithm-
K-NN is based on samples, as it saves all earlier data sample space for predicting target value for new input sample
predictor. It applies distance function to compute distance from new input sample predictor to all training sample
predictors and then k nearest (or smallest) distances are selected with corresponding target values. Selection of k is
a task that varies with the sensitivity of the dataset. Smaller the value of k,higher the variance and lower the bias
and vice versa. The advantages of k-Nearest Neighbors are it does not require training and optimization. It works
on locality concept and it is used for nonlinear and highly adaptable problem. As KNN uses all data sample during
prediction of new data case, its time and space complexity both are comparably high. It is a laziest technique
among all the machine learning techniques. [2]
In the field of crop yield prediction many researchers have contributed to introduce various methods and
advancements that could make the crop yield better. Other than KNN- Algorithm many other machine learning
algorithms are used and implemented.
Naïve Bayes Algorithm-
Based on Bayes’ Theorem, Naïve Bayes classifiers are a collection of classification algorithms. It is a combination
of various sharing a common principle, i.e. every classification is independent of each other. A classifier, in a
machine learning model, is used to differentiate various objects based on certain features. A Naive Bayes classifier is
a probability based machine learning model that is used for the classification.
Bayes Theorem:
P (A|B) = P (B|A) P (A) / P (B)
Using Bayes theorem, we can find the probability of A happening, given that B has occurred. Here, B is the
evidence and A is the hypothesis. The assumption made here is that the predictors/features are independent. That is
presence of one particular feature does not affect the other. Hence it is called naive.[2]
K-Star
The development of K*[7] in 2009 by Husain Aljazzar was a part of his PhD work. K* was originally implemented
as part of the DiPro toolset for the generation of counterexamples in probabilistic model checking. In 2017,
Sebastian Haufe has implemented a Java workbench for the K* algorithm. It comprises a Java implementation of
K*, based on the one included in DiPro, that is independent from the DiPro environment.
Random Forest
The Random Forest Classifier is a set of decision trees and these trees are randomly selected subset of training set.
Results of various decision trees are taken to make a final result of the test object[8]. It is also called ensemble
algorithm because it combines same or different algorithms for classifying objects.
In our research we have tried to find the best fit algorithm for Almora district that could be helpful for a better
prediction of crops in different seasons. There are still various factors like variation in climatic conditions (floods,
JOURNAL OF CRITICAL REVIEWS
ISSN- 2394-5125 VOL 7, ISSUE 12, 2020
4605
drought, etc) that could affect farming but here we have just tried to find a model that could be productive for this
region.
Proposed Methodology
In a country like India where a major part of population is living on agriculture, it is really important to get such
techniques which could increase the yield and solve the crop selection problems of the farmers.
AREA selected for study-
For this particular research, we have selected Almora district of Uttarakhand as a sample study area. In
Uttarakhand there are in all fourteen districts and each district has a varied agricultural diversity. This agricultural
diversity contributes in fulfilling the demands of the nearby states. Crops like rice (paddy), wheat, potatoes,
tomatoes, vegetables, corn, millets, sugarcane etc are grown in a recognizable amount in the selected area.
In order to apply various algorithms we have used WEKA tool here which is a useful software that makes the
implementation of all machine learning algorithms quite easy.
III. DATA SET
The dataset used in this research paper is made from Almora district of Uttarakhand state. The dataset is extracted
from dataset that is available on data.gov.in. This dataset comprises of features like Year, Area, Production,
Seasons, and Crop for more than three hundred values including different years. The reason for choosing Almora
region was that there were no research works done in the area and since Uttarakhand has a diversity of production
of crops and vegetables, it was quite necessary to explore this region and predict the best fit model to increase the
production of crops.
IV. METHODOLOGY
In order to predict the crop for our dataset we have used a tool named WEKA. Here we have compared various
machine learning algorithms with each other and got the most suitable one for our dataset.
A brief description of Weka tool is mentioned below-
The working-
1. The working of the module starts from preprocessing of the data. Figure 1 shows the result of
preprocessing on the available dataset.
Figure 1 Data Preprocessing
2. The next Section is classification of dataset. Here we select the “Choose” option and then select on
“lazy”. Within “lazy” an option named “IBk” is chosen which is our KNN classifier.
JOURNAL OF CRITICAL REVIEWS
ISSN- 2394-5125 VOL 7, ISSUE 12, 2020
4606
Selecting “IBk”
3. After clicking on “START” we get the resultant accuracy matrix on the preprocessed and classified
dataset. The accuracy matrix is listed in table 1.
Table 1 Accuracy Matrix
After checking the accuracy matrix we applied KNN algorithm on the dataset for K=3. The accuracy of the
algorithm is checked using the evaluation metrics like Mean absolute error, Root mean squared error, Relative
absolute error, Root relative squared error is obtained. The results of the selected error formulas for KNN is shown
in Table 2.
JOURNAL OF CRITICAL REVIEWS
ISSN- 2394-5125 VOL 7, ISSUE 12, 2020
4607
Table 2 Evaluation metric for KNN
Evaluation metric
Value predicted
Mean absolute error
0.0058
Root mean squared error
0.0307
Relative absolute error
9.9594
The results of table 2 shows the accuracy and least error for KNN where K=3.
4. Along with the best accuracy and the least error, we have predicted the crop with respect to various
attributes in the dataset as shown in table 3.
Table 3 Crop prediction based on season
Season
Crop
(Kharif)
Ragi, Horse gram, potato, sesamum
(Rabi)
Wheat, Millets, Soyabean
(Whole Year)
Onion, millets, Potato
V. RESULTS
The results for the mentioned work is now compared with algorithms like Random Forest, K Star or Naïve Bayes
on the selected dataset with the help of following evaluation metrics.
MAE (Mean Absolute Error)- It is the average of the absolute differences between prediction and actual
observation where all individual differences are equally weighed.
It is given by following formula-
MAE=∑ni=1 |Ai – Âi| /n
Root Mean Square Error (RMSE)-It depicts the error between two data sets. It compares a predicted
value with a known value. The smaller the value of RMSE, the lesser is the error between predicted and
known value.
It is given by the formula-
RMSE= (∑ni=1 (Ai – Âi)2/n)½
Relative Absolute Error (RAE)- It is the absolute of the difference between the approximate and exact
value divided by the exact value. It is different from relative error.
It is given by following formula-
RAE = | xA –xE / xE |
By applying all the evaluation metric on some of the frequently used Machine Learning algorithm for our
dataset we computed the result as given in table 4.
Table 4 Comparison of Machine Learning Algorithm based on evaluation metrics
Algorithm name/
Evaluation metrics
Mean Absolute Error
Root mean squared
error
Relative absolute error
Random Forest
0.0083
0.0419
14.2331
K Star
0.008
0.0506
13.816
KNN
0.0058
0.0307
9.9594
Naïve Bayes
0.0188
0.1045
32.223
According to the result computed it can be considered that KNN gives more accurate results for our selected dataset
based on the accuracy value and least error as compared to the algorithm like Random Forest, K Star or Naïve Bayes.
VI. CONCLUSION
We can now conclude that with the help of above discussed work KNN Algorithm is proved more accurate than
the rest of the mentioned Algorithms. A possibility occurs here with respect to the dataset. With a different dataset
may be the accuracy changes. This work is true for Almora district and may differ on a different dataset. With this
prediction the farmers there can have the idea of what kind of crops can be grown at what period of time. As
mentioned earlier, the production can still be low or high depending on the climatic conditions. As Uttarakhand is
one of the states that produces a variety of crops, vegetables, etc. it was really important to take this area as a
JOURNAL OF CRITICAL REVIEWS
ISSN- 2394-5125 VOL 7, ISSUE 12, 2020
4608
research area and find the best fit model in order to increase the productivity in the region.
REFERENCES
[1] R. Kumar, M. P. Singh, P. Kumar and J. P. Singh, "Crop Selection Method to maximize crop yield rate
using machine learning technique," 2015 International Conference on Smart Technologies and
Management for Computing, Communication, Controls, Energy and Materials (ICSTM), Chennai, 2015,
pp. 138-145, doi: 10.1109/ICSTM.2015.7225403.
[2] Analysing Soil Data using Data Mining Classification Techniques, Indian Journal of Science and
Technology, Vol 9(19), DOI: 10.17485/ijst/2016/v9i19/93873, May 2016.
[3] Experimental Analysis of Machine Learning Algorithms Based on Agricultural Dataset for Improving
Crop Yield Prediction, International Journal of Engineering and Advanced Technology (IJEAT) ISSN:
2249 – 8958, Volume-9 Issue-1, October 2019.
[4] Medar, Ramesh A., Vijay S. Rajpurohit, and Anand M. Ambekar. "Sugarcane Crop Yield Forecasting
Model Using Supervised Machine Learning." International Journal of Intelligent Systems and
Applications 11.8 (2019): 11.
[5] AlZu’bi, Shadi, et al. "An efficient employment of internet of multimedia things in smart and future
agriculture." Multimedia Tools and Applications 78.20 (2019): 29581-29605.
[6] Chlingaryan, Anna, Salah Sukkarieh, and Brett Whelan. "Machine learning approaches for crop yield
prediction and nitrogen status estimation in precision agriculture: A review." Computers and electronics
in agriculture 151 (2018): 61-69.
[7] Gandhi, Niketa, Leisa J. Armstrong, and Owaiz Petkar. "PredictingRice crop yield using Bayesian
networks." 2016 International Conference on Advances in Computing, Communications and Informatics
(ICACCI). IEEE, 2016.
[8] Computers and Electronics in Agriculture, Australian Centre for Field Robotics, Dept. of Aerospace,
Mechanical & Mechatronic Engineering, The University of Sydney, NSW 2006, Australia Centre for
Carbon, Water and Food, School of Life and Environmental Sciences, The University of Sydney, NSW
2006, Australia.
[9] Aljazzar, Husain, and Stefan Leue. "K⁎: A heuristic search algorithm for finding the k shortest
paths." Artificial Intelligence 175.18 (2011): 2129-2154.
[10] Lata, Kusum, and Sajidullah Khan. "Seasonal Environmental Data-Sets Simulations for Optimizing the
Crop Yield." Available at SSRN 3648998 (2020).
[11] Zhang, Lingxiao, et al. "Simulation and prediction of soybean growth and development under field
conditions." Am-Euras J Agr Environ Sci 7.4 (2010): 374-385.