ArticlePDF Available

Groundnut Crop Yield Prediction Using Machine Learning Techniques

July 2018

July 2018

Authors:

Vinita Shah

G H Patel College of Engineering and Technology (GCET)

Prachi Shah

Birla Vishvakarma Mahavidyalaya Engineering College

Yield prediction is a very important agricultural problem. Any farmer is interested in knowing how much yield he is about to expect. In the past, yield prediction was performed by considering farmer's experience on particular field and crop. Based on previous data, we can predict crop yield using machine-learning technique. Crop yield prediction is an important area of research, which helps in ensuring food security all around the world. We analyzed result of multiple linear Regression, Regression Tree, K-nearest Neighbor and Artificial Neural Network on Groundnut data of previous 8 years. We have done prediction based on Soil, Environmental and Abiotic attributes. KNN algorithm gives better result compared to other algorithms for Groundnut crop yield prediction.

Statewise Area Production and production of Groundnut [14]

…

Figures - uploaded by Vinita Shah

Content may be subject to copyright.

Content uploaded by Vinita Shah

Content may be subject to copyright.

CSEIT1835254 | Received : 15 June 2018 | Accepted : 26 June 2018 | May-June-2018 [ (3)5 : 1093-1097 ]

International Journal of Scientific Research in Computer Science, Engineering and Information Technology

1093

Groundnut Crop Yield Prediction Using Machine Learning

Techniques

Vinita Shah*1, Prachi Shah2

*1Assistant Professor, Information Technology, G H Patel College of Engineering and Technology, V V Nagar,

Gujarat, India

2Assistant Professor, Department of Information Technology, BVM Engineering College, V V Nagar, Gujarat,

India

ABSTRACT

Yield prediction is a very important agricultural problem. Any farmer is interested in knowing how much yield

he is about to expect. In the past, yield prediction was performed by considering farmer's experience on

particular field and crop. Based on previous data, we can predict crop yield using machine-learning technique.

Crop yield prediction is an important area of research, which helps in ensuring food security all around the

world. We analyzed result of multiple linear Regression, Regression Tree, K-nearest Neighbor and Artificial

Neural Network on Groundnut data of previous 8 years. We have done prediction based on Soil, Environmental

and Abiotic attributes. KNN algorithm gives better result compared to other algorithms for Groundnut crop

yield prediction.

Keywords : Crop analysis; Yield prediction; K-means; K-NN; Multiple Linear regression

I. INTRODUCTION

Data mining is the process of extracting useful

knowledge or information from large amount of data.

In digital generation, data mining is becoming an

increasingly important tool to transform data into

information. When the Data mining techniques are

used with agriculture data, the term is known as

precision agriculture. The main aim of the work is to

improve and substantiate the validity of yield

prediction, which is useful for the farmers.

Agricultural crop production depends on various

factors such as biology, climate, economy and

geography. Several factors have different impacts on

agriculture. So previous year‟s researchers used

appropriate statistical methodologies. A large number

of variables can affect agronomic traits such as yield.

Yield prediction is a very important agricultural

problem. Any farmer is interested in knowing how

much yield he is about to expect. In the past, yield

prediction was performed by considering farmer's

experience on particular field and crop. Consider that

data are available for some time back to the past,

where the corresponding yield predictions have been

recorded. In any of Data Mining procedures, / the

training data is to be collected from some time back

to the past and the gathered data is used in terms of

training which has to be exploited to learn how to

classify future yield predictions. Crop yield

prediction is an important area of research, which

helps in ensuring food security all around the world.

In our research, we have considered the effects of

environmental (weather), biotic (pH, soil salinity)

and area of production as factors towards crop

production in Gujarat. Considering these factors as

Volume 3, Issue 5, May-June-2018 | http:// ijsrcseit.com

Vinita Shah et al. Int J S Res CSE & IT. 2018 May-June; 3(5) : 1093-1097

1094

datasets for various districts, we proposed a new

approach that divides the yearly dataset in to

seasonally basis. Then we applied some clustering

techniques and suitable classification techniques to

obtain crop yield predictions.

It is explained as follows: Section 1 introduces the

area. Section 2 provides some related work in Crop

yield prediction. Section 3 explains motivations for

the work. Section 4 presents the Real dataset use for

our work. Section 5 presents methodology of the

work. Section 6 gives the analysis of proposed

approach.

II. MOTIVATION

Gujarat has been described by hot semi-dry

conditions, which has 49% of the aggregate

topographical range as developed grounds (9.6

million ha). The inundated zone is just 32% of the

aggregate developed territory. Rest of the range

confronts vast water deficiency. The ground waters

are of low quality. The alluvial soils are basic/saline

and lacking in nitrogen, phosphorus and zinc. The

editing force is low at 118%. The aggregate

sustenance grains creation of the state is 5.26 million

tons. The critical yields are groundnut, cotton, pearl

millet, maize, sorghum, castor, gram and mustard.

A farmer must have a good understanding of the soil

type, the biotic factors governing the soil and a

thorough knowledge about the traditional

agricultural practices to gain maximum crop yield.

Such practice may include harrowing and ploughing

using inputs such as fertilizers, insecticides and

herbicides [7].

It would be seen from the Fig.1 that though Gujarat

stands first in area and production but productivity

wise Tamil Nadu state stands first in the country.

However, Gujarat state groundnut production

depends on September rains. In case September rains

are not there in Gujarat, the production and

productivity will go down automatically. Therefore,

it is important to find the relation between climate

variable, rainfall and yield data.

Fig. 1.Statewise Area Production and production of

Groundnut [14]

III. PREPROCESSING

A. Dataset

In this research, two types of datasets are used:

Weather data and Yield data. The Weather dataset

used in this research has been collected from

Department of Metrological Centre, Anand

Agriculture University (AAU), Anand and Yield data

are taken from 4 different Yearbooks of Directorate

of Agriculture Gujarat State, Gandhinagar for four

different district of Gujarat which are Jamnagar,

Junagath, Rajkot, Amreli. A lot of pre-processing was

required to handle missing values, noise and outliers.

We considered different 20 attributes for this

research: rainfall, maximum and minimum

temperature, Vapour Pressure, Relative humidity,

Basic Sunshine, Evaporation Pressure, Soil

Temperature at different depth, Wind Speed,

irrigated area for all districts; and cultivated area for

crop yield considered according to the districts.

After the necessary formatting and pre-processing of

Volume 3, Issue 5, May-June-2018 | http:// ijsrcseit.com

Vinita Shah et al. Int J S Res CSE & IT. 2018 May-June; 3(5) : 1093-1097

1095

The datasets, the finalized version of our data

contains

Total 4 districts for the time periods of 2006 to 2013.

B. Input Variables and Measures

From the vast initial dataset, we selected a limited

number of important input variables that is 20, which

have the highest contribution to agricultural

production. All the inputs were considered for the

eight-year periods of 2006 to 2013.

There are various measures to different prediction

models:

1. RSquare [Coefficient of determination] is

simply the square of the sample correlation

coefficient (i.e., r) between the outcomes and

their predicted values. The coefficient of

determination ranges from 0 to 1.

2. RMSE [Root Mean Square Error] is a

frequently used measure of the difference

between values predicted by a model and the

values actually observed from the

environment that is being modeled.

IV. METHODOLOGY

A. Proposed Flow

The existing papers involves less number of attributes

of Climate data. Instead of that, we will use more

number of attributes including soil attributes which

had not been considered in previous work. Also,

instead of yearly approach, we have done on

seasonally basis approach to get more accurate results.

For seasonally basis approach, we divided total

dataset in to two seasons 1.Kharif And 2.Summer as

groundnut production are more in this two seasons.

For that purpose, we created GUI in java which

divides monthly data sets into season wise cluster

data and also combine the associated yield dataset of

that particular season for all 4 districts. Selection is

done by season wise and generates the dataset

according to its climatic attributes average and

fetches its yield from dataset and then generates

result for each year. So basically, it combines 2

datasets and also divides them into season wise of

kharif and summer and displays the output results in

GUI of each district.

B. Individual effect of Attributes on yield

To find the individual attribute effect on yield

correlation is perform on the data. In addition, values

of „r‟ and “R square” are generated. From that, we can

see that which attribute affect more on yield.

From the values of R Square, it clearly sees that above

values of 0.5 depends more on yield and are

important attribute to grow of plant so we have

considered it.

C. Combined effect of Attributes on yield

We have taken 4 different alternatives from which

the different models accuracy measures can be found.

A. Environmental Attributes:

Rainfall, Maximum Temperature, Minimum

Temperature, Basic Sunshine, Relative humidity(2

times), Vapor Pressure(2 times), Evaporation Pressure

B. Soil Attributes:

Temperature from the different depth of Soil (6)

C. Abiotic Attributes:

Water Content (2), Density (2), Wind Speed

D. Area Central Attributes:

Total area from which the yield is produced and total

production

The following classification/regression models were

used to obtain the crop yield prediction results:

Volume 3, Issue 5, May-June-2018 | http:// ijsrcseit.com

Vinita Shah et al. Int J S Res CSE & IT. 2018 May-June; 3(5) : 1093-1097

1096

a) Linear Regression: It is a statistical measure that

can be used to determine the strength of the

relationship between one dependent variable and a

series of other changing variables known as

independent variables (regular attributes). If

independent variable contains multiple input

attributes like in our research (rainfall, sunshine

hours, humidity, pH etc), then it is termed as

multiple linear regressions. Linear regression

provides a model for the relationship between a

scalar variable and one or more explanatory variables.

This is done by fitting a linear equation to the

observed data [7].

b) k-NN: The k-nearest neighbor algorithm compares

a given test example with training examples which

are similar. Each example denotes a point in an

ndimensional space. Thus, all of the training

examples are saved in an n-dimensional pattern space.

K is a positive integer, usually small. For our purpose,

the basic k-NN algorithm was applied. It first finds

the k examples from the training set that are closest

to the unknown example. Then it takes the most

common occurring classification for the k examples

[7].

c) Neural Network: An artificial neural network

(ANN) is a mathematical model or computational

model inspired by the structure and functional

aspects of biological neural networks for instance in

our brains. In most cases an ANN is an adaptive

system that modifies its structure based on external

or internal information that flows through the

network during the learning phase. The basic neural

network model consists of three layers: the input

layer, the hidden layer and an output layer [7].

d) Regression Tree: This are commonly used in data

mining with the objective of creating a model that

predicts the value of a target (or dependent variable)

based on the values of several input (or independent

variables).In Regression Trees target variable is

continuous and tree is used to predict its value.

V. RESULT ANALYSIS

In our research, we determined prediction results for

yields of Groundnut crop for the selected districts in

Gujarat. The predictions results were obtained

according to the selected input attributes using

appropriate classification and regression models.

The tool used to provide predictive algorithms,

include powerful methods to explore, transform and

“clean” data, partition data into “training” and

“validation” sets, train one or more machine

learning models, and evaluate and compare the

performance of those models.

For prediction, four different algorithms are used.

 Multiple linear Regression

 Regression Tree

 K-nearest Neighbor

 Artificial Neural Network

Fig. 2. Comparison of prediction algorithms

RMSE values are compared of four different methods

for all four different alternatives which are discussed

in figure 2.

Fig 3.Comparative Analysis of four-prediction model

with four approaches

Volume 3, Issue 5, May-June-2018 | http:// ijsrcseit.com

Vinita Shah et al. Int J S Res CSE & IT. 2018 May-June; 3(5) : 1093-1097

1097

VI. CONCLUSION AND FUTURE WORK

Weather data has a major role in affecting the

Agriculture data of crop yield. For ground nut crop

yield prediction, K-nearest neighbor for prediction

model gives better Results in training as well as

validation for the data then other three techniques

which are Artificial Neural network and two

regression techniques multiple linear regression(MLR)

and Regression tree. Apart from that, if using larger

dataset of previous years can also be affect the

accuracy of algorithms and can get better accuracy or

less RMSE value in future.

VII. REFERENCES

[1] MucherinoPetraqPapajorgji, P. M. Pardalos,”A

survey of data mining techniques applied to

agriculture”, Springer, 2009.

[2] S.S.Bhaskar, L.Arokiam, V.Arul Kumar,

L.Jeyassimaan,”A Brief Survey Of Data Mining

Techniques To agriculture Applications”,

MadwellJournels, 2010.

[3] Ramesh A. Medar,Vijay. S. Rajpurohit,”A

survey on Data Mining Techniques for Crop

Yield Prediction”,IJARCSMS, 2014

[4] Hui Chen,WeiWu,Hong-Bin Liu,“Assessing the

relative importance of climate variables to rice

yield variation using support vector

machines”,Springer,2015.

[5] D Ramesh, B Vishnu Vardhan,”Analysis Of

Crop Yield Prediction Using Data Mining

Techniques”, IJRET, 2015.

[6] D Ramesh, B Vishnu Vardhan, “Crop Yield

Prediction Using Weight Based Clustering

Technique “, IJCEA, 2015.

[7] A.T.M ShakilAhamed, NavidTanzeem

Mahmood, Nazmul Hossain, Mohammad

TanzirKabir, Kallal Das, Faridur Rahman,

Rashedur M Rahman,”Applying Data Mining

Techniques to Predict Annual Yield of Major

Crops and Recommend PlantingDifferent Crops

in Different Districts in Bangladesh”,IEEE,2015

[8] D. Ramesh,B Vishnu Vardhan, O

SubhashChanderGoud,” Density Based

Clustering Technique on Crop Yield

Prediction”IJEEE,2014

[9] Mohammad MotiurRahman,NaheenaHaq,

Rashedur M Rahman,”Application of Data

Mining Tools for Rice Yield Prediction on

Clustered Regions of Bangladesh”,IEEE,2014

[10] D Ramesh, B Vishnu Vardhan,”Data Mining

Techniques and Applications to Agricultural

Yield Data”, IJARCCE, 2013

[11] José R. Romero , Pablo F. Roncallo , Pavan C.

Akkiraju , Ignacio Ponzoni , Viviana C.

Echenique, Jessica A. Carballido,”Using

classification algorithms for predicting durum

wheat yield in the province of Buenos Aires”,

ELSVIER, 2013.

[12] YunousVagh, JitianXiao, "A data mining

perspective of dual effect of Rainfall and

Temperature on Wheat Yield", ECU, 2012

[13] Parekh, F. P, Suryanarayana, T. M. V ,”Impact

of Climatological Parameters on Yield of Wheat

Using Neural Network Fitting”,International

Journal of Modern Engineering Research ,2012

[14] Ye, Nong; Data Mining: Theories, Algorithms,

and Examples, CRC Press, 2013.

[15] Data mining Concepts and Techniques; Jiawei

Han and MichelineKamber; Second Edition,

Morgan kaufmann publishers.

[16] https://dag.gujarat.gov.in/images/directorofagric

ulture/pdf/Groundnut-Book.pdf

Models for feature selection and efficient crop yield prediction in the groundnut production

Article

Full-text available

Sep 2022

Tamil Nadu ranks high in groundnut production in India. The yield prediction of the crop over Tamil Nadu will be highly useful in improving the efficiency of the production. This article aims to identify an efficient machine learning model to predict the groundnut crop yield and analyse the performance of the tested models. The study used the irrigation, rainfall, area and production data as factors for the groundnut crop yield across the districts of Tamil Nadu. This article identified the best set of features for training the models and studied various prediction models to evaluate the performance on the collected data. The trained and tested data were evaluated using various performance measures. The results of the study show that LASSO and ElasticNet provide the optimal results with the lowest RMSE and RRMSE values of 491.603 and 490.931 kg·ha<sup>–1</sup>, 20.68 and 20.66%, respectively. The models showed the lowest MAE and RMAE values as well (333.154 and 331.827 kg·ha<sup>–1</sup> and 14.53%, 14.51%, respectively) when compared to other models. The identification of the right time to sow and area to irrigate through feature selection and the prediction of the yield will improve the yield of the groundnut crops. This helps farmers to make practical decisions and reap the benefits.

Monitoring Growth and Development of Bambara Groundnut Using a Low-Cost Unmanned Aerial Vehicle

Conference Paper

Jul 2023

This study aimed to evaluate the performance of an unmanned aerial vehicle based remote sensing to quantify Bambara groundnut crop state variables. During Malaysia’s 2018/19 Bambara growing season, remotely sensed colour infrared images and in-situ crop state variables were acquired at vegetative, flowering, podding, podfilling, maturity and senescence phenological stages. Five common vegetation indices (VIs) were calculated from the images resulting in single stage and cumulative VIs (ΣVIs). Correlation analyses were used to examine the relationship between crop variables and single stage VIs/ΣVIs. Linear and machine learning (ML) regression were used to estimate crop variables using the VIs/ΣVIs as input. The best single-stage correlations were observed at flowering. ΣVIs from vegetative to senescence had strongest correlations with crop variables. K-Nearest Neighbour regression outperformed other ML regression in estimating crop variables. However, multiple linear regression were as good as the best performing ML algorithms in estimating groundnut crop variables. Index Terms—UAV; modified digital camera; precision agriculture; low-cost technology, crop variables

RANDOM FOREST AND REGRESSION ANALYSIS FOR ATTITUDE OF RESEARCH SCHOLARS TOWARDS INTERNET EXPOSURE

Article

Full-text available

Dec 2021

Internet has added a new aspect to our existence by placing within easy reach huge variety of information. Internet gives each of us the choice to be a publisher of our data, information and views. The Internet offers prosperity of agricultural business opportunities. Besides, to achieve another main objective to study an attitude of research scholars towards internet exposure using random forest and linear regression models. Random forest model was better with prediction accuracy of 87 % (R 2), lowest MAE of 1.71 and RMSE of 2.29 as compared with linear regression model. It was further noticed that, the actual and predicted attitude towards internet exposure are closed to each other. The regression analysis suggested that input

Towards a Multimodal System for Precision Agriculture using IoT and Machine Learning

Preprint

Full-text available

Jul 2021

Precision agriculture system is an arising idea that refers to overseeing farms utilizing current information and communication technologies to improve the quantity and quality of yields while advancing the human work required. The automation requires the assortment of information given by the sensors such as soil, water, light, humidity, temperature for additional information to furnish the operator with exact data to acquire excellent yield to farmers. In this work, a study is proposed that incorporates all common state-of-the-art approaches for precision agriculture use. Technologies like the Internet of Things (IoT) for data collection, machine Learning for crop damage prediction, and deep learning for crop disease detection is used. The data collection using IoT is responsible for the measure of moisture levels for smart irrigation, n, p, k estimations of fertilizers for best yield development. For crop damage prediction, various algorithms like Random Forest (RF), Light gradient boosting machine (LGBM), XGBoost (XGB), Decision Tree (DT) and K Nearest Neighbor (KNN) are used. Subsequently, Pre-Trained Convolutional Neural Network (CNN) models such as VGG16, Resnet50, and DenseNet121 are also trained to check if the crop was tainted with some illness or not.

An Investigation on Machine Learning Algorithms for Crop Yield Prediction

Conference Paper

Nov 2023

DEVELOPMENT OF MACHINE LEARNING ALGORITHMS FOR IMPROVING CROP YIELD PREDICTION by ISMAILA OSHODI

Thesis

Full-text available

Jan 2023

Ismaila Kolawole Oshodi

Agriculture production is the economic and social backbone of the world, providing food and other products that sustain human life and support economic development. However, challenges such as increasing population growth, climate change, and industrialization can impact crop yield. Crop yield plays a crucial factor in the success of agricultural production, it’s the process of estimating the number of crops that will be produced in each season or location. Many approaches can be taken to improve crop yield, in this study, I developed machine learning algorithms to improve crop yield prediction by considering the individual factors that contribute to crop yield, including weather patterns, soil conditions, weeds and pest infestations, and decisions on what type of crops to grow. By improving the accuracy of these factors, I aim to improve crop yield predictions. To achieve this, I developed five modules that contribute to crop yield prediction: a crop recommendation module, a weather prediction module, a fertilizer recommendation module, a plant disease identification module, and a weed detection module. These modules were integrated into a user-friendly web interface that farmers could use to input data and receive predictions based on the modules. The performance of the algorithms was measured using accuracy, and they all achieved high accuracy between 53% and 100%. This study also analyzed the impact of climate change on agriculture and climate-smart agricultural practices. Overall, the development of machine learning algorithms for improving crop yield prediction represents a significant step forward in addressing the challenges facing agriculture and supporting the economic and social development of communities around the world. Adopting these systems would help improve crop yield predictions, help farmers make informed decisions about their farming operations, reduce GHG emissions in the agricultural sector, and support the adoption of sustainable farming practices.

Azolla Quantity Prediction using WSN and Machine Learning

Conference Paper

Nov 2022

A Review on Prediction of Crop Yield using Machine Learning Techniques

Conference Paper

Jul 2022

Deep Learning Based Yield Prediction Model To Predict The Yield of Paddy In Cauvery Delta Region

Conference Paper

Jan 2022

Towards a Multimodal System for Precision Agriculture using IoT and Machine Learning

Conference Paper

Full-text available

Jul 2021

Using classification algorithms for predicting durum wheat yield in the province of Buenos Aires

Article

Full-text available

Aug 2013
COMPUT ELECTRON AGR

Wheat is one of the most important cereals worldwide for human nutrition. Tetraploid wheat (Triticum turgidum L. ssp. durum, 2n = 28, genomes AABB) is mainly used to produce pasta. The main objective of durum wheat breeding programs is to develop varieties with good quality and high yields. Yield is a very complex trait, and depends on different yield components that are genetically controlled and affected by environmental constraints. In this context, machine learning constitutes an excellent alternative for the analysis of a high number of traits in order to extract the most relevant ones as confident predictors of the performance of this crop, allowing a better agricultural planning. Thus, we propose the use of machine learning algorithms for the classification of yield components and for the search of new rules to infer high yields at harvest of durum wheat. The main objective of this work was to obtain rules for predicting durum wheat yield through different machine learning algorithms, and compare them to detect the one that best fits the model. In order to achieve this goal, One-R, J48, Ibk and A priori algorithms were run with data collected by our research group of a RIL (recombinant inbreed lines) population growing in six different environments from the Province of Buenos Aires in Argentina. The results indicate that the A priori method obtains the best performance for all locations, and the classificators generated using the different algorithms share a common set of selected traits. Moreover, comparing these results with the previous ones obtained using different techniques, mainly QTL mapping, the traits indicated to be the most significant ones were the same. The analysis of the resulting rules shows the soundness in the agronomic relevance of the extracted knowledge.

Data Mining Techniques and Applications to Agricultural Yield Data

Article

Sep 2013

Vishnu vardhan Bulusu

Application of data mining tools for rice yield prediction on clustered regions of Bangladesh

Article

Mar 2015

Reservation of adequate food is a major concern for many developing countries worldwide. Governments of those countries also want to meet the demand of food in the long term, especially during the period of natural calamity. For low lying region such as Bangladesh, predicting the supply of food is critical. Besides, rice productivity of Bangladesh has also changed due to varying climatic over the last couple of decades. In this paper, a comprehensive analysis has been made to forecast the rice yield of Bangladesh.

Clustering Technique on Crop Yield Prediction

Jan 2014

Clustering Technique on Crop Yield Prediction"IJEEE,2014

A data mining perspective of dual effect of Rainfall and Temperature on Wheat Yield

Jan 2012

Jitianxiao Yunousvagh

YunousVagh, JitianXiao, "A data mining perspective of dual effect of Rainfall and Temperature on Wheat Yield", ECU, 2012

Impact of Climatological Parameters on Yield of Wheat Using Neural Network Fitting

Jan 2012

F P Parekh
T M Suryanarayana

Parekh, F. P, Suryanarayana, T. M. V,"Impact of Climatological Parameters on Yield of Wheat Using Neural Network Fitting",International Journal of Modern Engineering Research,2012

Groundnut Crop Yield Prediction Using Machine Learning Techniques

Abstract and Figures

Recommended publications

Enhancement Of Agriculture Based Crop Yield Prediction Using R Tool And Machine Learning

IJARCCE Brief Survey of data mining Techniques Applied to applications of Agriculture

Predicting Early Crop Production by Analysing Prior Environment Factors

Cost-sensitive Semi-supervised Learning Ensembled with Active Learning

Farmer Buddy-Weather Prediction and Crop Suggestion using Artificial Neural Network on Map-Reduce Fr...