PresentationPDF Available

Basic Statistics for Agriculture Research

Authors:
  • NIFS - Sri Lanka

Abstract

This presentation highlights the importance of statistics in agricultural research. Statistics is a crucial tool that provides a scientific framework for designing experiments, analyzing data, and making sound conclusions based on the results. In agricultural research, statistics is used for designing experiments, analyzing data, making decisions, improving yield, and managing risks. By identifying key factors that affect different parameters like yield and assessing risks associated with agriculture, farmers, and researchers can make data-driven decisions to improve their practices and protect their livelihoods. As the challenges facing agriculture become more complex, the need for statistical methods in agricultural research will only continue to grow.
What is Statistics and Why it is important?
Statistics is the study and manipulation
of data, including ways to gather,
review, analyse, presenting and draw
conclusions from data
Statistics in research can help a
researcher approach the study in a
stepwise manner,
1. Testing of Hypothesis
2. Establishing a Sample Size
3. Data Interpretation Through Analysis
Content of the Presentation
What is Statistics and Why it is important?
Steps of research
1. Writing the hypotheses and planning the research design
2. Collect data from a population and correct sampling
3. Summarize and understand the data using descriptive
statistics
4. Test hypotheses using inferential statistics
Choosing the right statical test
Parametric tests
Non- parametric tests
5. Results presentation and interpretation
Step 01:Writing the hypotheses and planning the
research design
Writing statistical hypotheses
The goal of research is often to investigate a relationship between
variables within a population. You start with a prediction, and use
statistical analysis to test that prediction
A statistical hypothesis is a formal way of writing a prediction about a
population. Every research prediction is rephrased into null (H0) and
alternative (HA) hypotheses that can be tested using sample data
Planning your research design
A research design is your overall strategy for data collection and analysis. It
determines the statistical tests you can use to test your hypothesis later on
1. Experimental design- Assess a cause-and-effect relationship
2. Correlational design- Explore relationships between variables
3. Descriptive design- study the characteristics of a population or
phenomenon
Measuring variables
When planning a research design, you
should operationalize your variables
and decide exactly how you will
measure them,
Variables
Independent
(cause) Dependent
(effect) Categorical
Nominal
Ordinal
Numerical
Interval
(Discrete)
Ratio
(Continuous)
Scale of Measurements
Nominal
Ordinal
Interval
Ratio
Named
Order
Interval between
variables
Absolute zero
Effect Type
Step 2: Collect data from a population and correct
sampling
A sample refers to a smaller, manageable version of a larger group. It is a
subset containing the characteristics of a larger population
Why Sample?
Necessity: Simply not possible to study the whole population
Practicality: Easier and more efficient to collect data from a sample
Cost-effectiveness: Fewer participant, laboratory, equipment, and
researcher costs involved
Manageability: Storing and running statistical analyses on smaller datasets
is easier and reliable
Sampling for statistical analysis
1. Probability sampling: every member of the population has a chance of
being selected for the study through random selection
2. Non-probability sampling: some members of the population are more
likely than others to be selected for the study because of criteria
such as convenience or voluntary self-selection
Non-parametric
tests
Parametric tests
Sample Size (How small is too small or vice versa?)
A sample that’s too small may be unrepresentative of the
sample, while a sample that’s too large will be more costly
than necessary
As a rule of thumb, a minimum of 30 units or more per
subgroup is necessary. But it depends on the field of study. Ex
-Clinical research
To determined this, the concept depends on
1. Significance level (alpha): the risk of rejecting a true null hypothesis
that you are willing to take, usually set at 5%.
2. Statistical power: the probability of your study detecting an effect of
a certain size if there is one, usually 80%or higher
3. Expected effect size: a standardized indication of how large the
expected result of your study will be, usually based on other similar
studies.
4. Population standard deviation:an estimate of the population
parameter based on a previous study or a pilot study of your own
Step 3: Summarize and understand the data using
descriptive statistics
There are various ways to inspect your data,
Organizing data from each variable in frequency
distribution tables
Displaying data from a key variable in a bar chart to
view the distribution of responses
Visualizing the relationship between two variables
using a scatter plot
A normal distribution means that your data are symmetrically distributed around a centre where most
values lie, with the values tapering off at the tail ends
Measures of central tendency
where most of the values in a data set lie
Mode: the most popular response or value in the data
set
Median: the value in the exact middle of the data set
when ordered from low to high.
Mean: the sum of all values divided by the number of
values
Measures of variability
How spread out the values in a data set are
Range: the highest value minus the lowest value of the data
set.
Interquartile range: the range of the middle half of the data
set.
Standard deviation: the average distance between each value
in your data set and the mean.
Variance: the square of the standard deviation
Step 4: Test hypotheses using inferential statistics
A number that describes a sample is called a
statistic, while a number describing a population
is called a parameter
Using inferential statistics, you can make
conclusions about population parameters based
on sample statistics
Estimation
calculating population
parameters based on sample
statistics
Point estimate
value that represents
Interval estimate
range of values
Hypothesis testing
predictions about the population
using samples
Test Statistic
How much the data
differs from the H0
P-Value
likelihood of obtaining
your results if the H0
Choosing the right statical test
Statistical tests are used in hypothesis testing
Statistical assumptions
1. Independence of observations (no autocorrelation): The observations/variables
you include in your test are not related
2. Homogeneity of variance: the variance within each group being compared is
similar among all groups. If one group has much more variation than others, it
will limit the test’s effectiveness
3. Normality of data: the data follows a normal distribution (bell curve). This
assumption applies only to quantitative data
If your data do not meet the assumptions of normality or homogeneity of variance, you may be able to
perform a nonparametric statistical test, which allows you to make comparisons without any assumptions
about the data distribution
If your data do not meet the assumption of independence of observations, you may be able to use a test
that accounts for structure in your data. Ex. Durbin-Watson test
Choosing a parametric test - 01
Comparison tests - differences among group means
Test
Predictor type and
number of predictors
Research Example
Paired t
-test
Categorical
1 predictor
effectiveness of a new fertilizer on crop yields
Independent t
-test
Categorical
1 predictor
comparing the yield of two different varieties
of wheat
ANOVA
Categorical
1 or more predictors
farmer wants to determine which fertilizer
produces the highest yield for a certain crop
MANOVA
Categorical
1 or more predictors
researcher wants to evaluate the effects of
different types of fertilizers on the yield of
three different crops
T-tests are used when comparing the means of precisely two groups
ANOVA and MANOVA tests are used when comparing the means of more than two groups
Choosing a parametric test- 02
Regression tests- cause-and-effect relationships. Use to estimate the
effect of one or more continuous variables on another variable
Correlation tests- check whether variables are related, without
hypothesizing a cause-and-effect relationship
Test
Predictor type and
number of predictors
Outcome variable
Research Example
Simple linear
regression
Continuous
1 predictor
Continuous
1 outcome
predict the yield of a crop based on a
single factor such as the amount of
fertilizer used
Multiple linear
regression
Continuous
2 or more predictors
Continuous
1 outcome
agriculture to predict crop yield based
on several independent variables, such
as weather conditions, soil nutrients,
and irrigation practices
Logistic regression
Continuous
Binary
Test
Variables
Research Example
Pearson’s
correlation
2 continuous variables
How are
latitude and temperature related?
Choosing a nonparametric test
Non-parametric tests don’t make as many assumptions about the data
Predictor variable
Outcome variable
Example
Spearman’s
correlation
Quantitative
Quantitative
Correlation between the
amount of fertilizer used
and
the yield
of a particular crop
Chi
-square
test of
independence
Categorical
Categorical
Farmer has three
types of fertilizers that they can use
on their crop on
two different crops
Sign test
Categorical
Quantitative
Farmer is testing a new fertilizer
treatment on two
groups of tomato
plants
Kruskal
Wallis H
Categorical
3 or more groups
Quantitative
Farmer and you are interested in comparing the yields
of
three different varieties of wheat
ANOSIM
Categorical
3 or more groups
Quantitative
2 or more outcome
variables
Researchers interested in studying the impact of
different fertilizers
on the microbial community
structure
in soil
Wilcoxon Rank
-
Sum test
Categorical
2 groups
Quantitative
groups come from
different populations
Researchers testing
two different types of fertilizers
to determine which one is more effective in increasing
the yield
Wilcoxon Signed
-
rank test
Categorical
2 groups
Quantitative
groups come from the
same population
Farmer wants to test whether a
new fertilizer they're
using is improving
crop yields compared to their
previous fertilizer
Step 5: Results presentation and interpretation
Statistical significance- A determination made by an
analyst that results in the data are not explainable by
chance alone
In hypothesis testing, statistical significance is the main criterion for forming
conclusions
You compare your p-value to a set significance level (usually 0.05) to decide
whether your results are statistically significant or non-significant
Statistically significant results are considered unlikely to have arisen solely
due to chance. There is only a very low chance of such a result occurring if
the null hypothesis is true in the population
How to select the right significant level?
A significance level, also known as alpha
(α), is the probability of making a Type I
error, which is the rejection of a true null
hypothesis
The most common significance level
used in scientific research is 0.05, which
means that there is a 5% chance of
making a Type I error
the significance level you choose can
depend on several factors,
Including the nature of the research
question
The consequences of a Type I error
The sample size
General guidelines for selecting a significance level,
1. Consider the consequences of a Type I error: If a Type I
error could have serious consequences, such as in medical
research or safety-critical systems, you may want to choose
a lower significance level, such as 0.001.
2. Consider the sample size: With larger sample sizes, even
small differences between groups can be statistically
significant. In this case, you may want to choose a lower
significance level to reduce the chance of a false positive
3. Consider the field and previous research: The significance
level used in your field or previous research may influence
your choice. For example, some fields may have a higher or
lower tolerance for Type I errors.
4. Use common levels: If there are no specific reasons for
choosing a different level, using commonly accepted levels
such as 0.05, 0.01 or 0.001 is a good choice.
Results presentation
Results presentation
Analysis
Subgroup
Number of variables
Type
Comparison
Among items
Two per items
Variable width
column chart
One per item
Bar/column chart
Over time
Many periods
Circular area/line chart
Few periods
Column/line chart
Relationship
Two
Scatter chart
Three
Bubble chart
Distribution
Single
Column/line histogram
Two
Scatter chart
Three
Three
-dimensional area chart
Comparison
Changing over
time
(temporal)
Only relative differences
matter
Stacked 100% column chart
Relative and absolute
differences matter
Stacked column chart
Static
Simple share of total
Pie chart
Accumulation
Waterfall chart
Components of components
Stacked 100% column chart with subcomponents
Junyong In and Sangseok Lee (2005)
Scribbr. (n.d.). The Beginner’s Guide to Statistical Analysis | 5 Steps & Examples. https://www.scribbr.com/category/statistics/
In, J., & Lee, S. (2017). Statistical data presentation. Korean journal of anesthesiology,70(3), 267-276.
ResearchGate has not been able to resolve any citations for this publication.
ResearchGate has not been able to resolve any references for this publication.