Content uploaded by Suria M Ellis
Author content
All content in this area was uploaded by Suria M Ellis on Jun 01, 2016
Content may be subject to copyright.
ABSTRACT
Statistical significance tests have a tendency to yield
small p-values (indicating significance) as the size of the
data sets increase. The effect size is independent of
sample size and is a measure of practical significance. It
can be understood as a large enough effect to be
important in practice and is described for differences in
means as well as for the relationship in two-way
frequenc y tables and also for a multiple regression fit.
INTRODUCTION
An advantage of drawing a random sample is that it enables one
to study the properties of a population with the time and money
available. In such cases the statistical significance tests (eg. t-
tests) are used to show that the result (eg. difference between two
means) are significant. The p-value is a criterion of this, giving
the probability that the obtained value or larger could be obtained
under the assumption that the null hypothesis (eg. no difference
between the means) is true. A small p-value (eg. smaller than
0.05) is considered as sufficient evidence that the result is
statistically significant. Statistical significance does not
necessarily imply that the result is important in practice as these
tests have a tendency to yield small p-values (indicating
significance) as the size of the data sets increase.
In many cases researchers are forced to consider their
obtained results as a subpopulation of the target population
due to the weak response of the planned random sample.
In other cases data obtained from convenience sampling
are erroneously analysed as if it were obtained by random
sampling. These data should be considered as small
populations for which statistical inference and p-values are
not relevant. Statistical inference draws conclusions about
the population from which a random sample was drawn,
using the descriptive measures that have been calculated.
Instead of only reporting descriptive statistics in these
cases, effect sizes can be determined. Practical
significance can be understood as a large enough
difference to have an effect in practice.
Many different effect sizes exist (see Rosenthall, 1991 and
Steyn, 1999) but here we will only discuss those most
frequently used, i.e. for the difference between means and for
relationships in two-way frequency (contingency) tables and
in multiple regression..
EFFECT SIZE FOR THE DIFFERENCE BETWEEN
MEANS
Consider the following example of testing the difference
in IQ's of two random samples of size 200 from different
populations. With mean standard deviation of
110 ± 10 and 107 ± 12 a test statistic of
with p=0.007 is obtained. It is apparent that the difference
in mean IQ's are statistically significant (p<0.05), but is the
difference between IQ's of 110 and 107 important enough
to be of practical significance? According to the IQ scale a
difference of 3 units is not important. We are interested in
finding a measure for practical significance analogous to
the test statistics z or t, which are being used to decide
whether a statistical significant difference between two
means holds.
Management Dynamics Volume 12 No. 4, 2003 51
S.M. Ellis
H.S. Steyn
Potchefstroom University for CHE
Practical significance (effect sizes) versus
or in combination with statistical
significance (p-values)
22
10 12
200 200
110 107 2.72z
RESEARCH NOTE
A natural way to comment on practical significance is to use
the standardised difference between the means of two
populations, i.e. the difference between the two means divided
by the estimate for standard deviation. We introduce a measure
that is called the effect size, which not only makes the
difference independent of units and sample size, but relates it
also with the spread of the data, see Steyn (1999) and Steyn
(2000). Table 1 gives the effect size in different situations.
Comments:
(a) is the difference between and without
taking the sign into consideration. Here the direction of
the difference is not important. If it is of import ance,
formulas (1) to (4) can be altered to
,
and .
(b) At formulas (3) and (4) the assumption is made of
equal population standard deviations, therefore , the
pooled value, is used in the denominator.
(c) At formula (1) the difference in means relative to the
control group's standard deviation is used, since in
such cases the control group is the point of departure.
When no control group exists, the division by s in
max
formula (2) gives rise to a conservative effect size in
the sense that a practically significant result will not
be concluded too easily.
Cohen (1988) gives the following guidelines for the
interpretation of the effect size in the current case:
(a) small effect: d=0.2, (b) medium effect: d=0.5 and (c)
large effect: d=0.8. (5)
We consider data with d 0.8 as practically significant,
since it is the result of a difference having a large effect.
The effect size for the difference in IQ's in the example is
indicating that the effect is not
practically significant.
EFFECT SIZE FOR THE RELATIONSHIP IN A
CONTINGENCY TABLE
In many cases it is important to know whether a
relationship between two variables are practically
significant, eg. between gender and preference for or
against a new medical scheme for workers. For random
samples, the statistically significance of such relationships
are determined with Chi-square tests, but actually one
wants to know whether the relationship is large enough to
be important.
2
In this case the effect size is given by where X is
the usual Chi-square statistic for the contingency table and
n is the sample size, see Steyn (1999) and Steyn (2002). In
the special case of a 2 x 2 table, the effect size(w) is given
by the phi( ) coefficient. Note that the effect size is again
52
TABLE 1
EFFECT SIZES FOR MEANS
Test Conditions Effect size
z or t
xE
: experimental
xs
KK
,
: control
dxx
s
EK
K
(1)
z or t
1
and
2
not necessarily equal.
Take
smax
maximum of
s1
and
s2
dxx
s
12
max
(2)
t
12:
s
the pooled standard deviation
dxx
s
12
(3)
ANOVA
for all , :
ijij
MSE the mean square error of analysis
of variance
ij
xx
dMSE
(4)
12
xx
1
x
2
x
/,
EKK
dxxs
12max
/dxxs
12/dxxs
ij
xx
dMSE
110 107 0.25,
12
d
2,
X
n
w
independent of the sample size. Cohen (1988) gives the
following guidelines for the interpretation of it in the
current case:
(a) small effect: w = 0.1, (b) medium effect: w = 0.3,
(c) large effect: w = 0.5. (6)
A relationship with w is considered as practically
significant.
EFFECT SIZE OF A MULTIPLE REGRESSION FIT
2
The coefficient of determination (R) is a measure of the
goodness-of-fit of the multiple regression fit, with
2
0 R 1. It can be interpreted as the proportion of
variation in the response variable explained by (or
attributed to) the fitted model.
2
The question is how large R should be to be significantly
greater than zero or to be of practical importance. The usual
2
F-test is used to decide whether R is statistically significant.
However, this does not necessarily imply a good fit for the
multiple regression. The effect size gives such a measure.
This effect size is calculated as the proportion of the
variation accounted for by the regression line
relative to the proportion not accounted for
2
Cohen (1988) suggested the following guidelines for f .
2
For the value f = 0.02 a small effect is established, which
2
means that R is approximately also 0.02. This means that
only 2% of the criterion variance is explained. Further
2
f is taken as a medium effect, because a value of
2
0.13 for R, explaining 13% of the criterion variance, gives
22
a f -value of 0.15. Finally f = 0.35 can be taken as a large
2
effect, which means that R is roughly 0.25. Here one
quarter of the proportion criterion variance is due to the
regression. In the light of the above-mentioned reasoning,
we can agree upon the outline in Table 2.
TABLE 2
CONCLUSIONS FROM EFFECT SIZES
222
Effect size (f ) Effect Values of RConclusions on R
Smaller than 0.15 Small Smaller than 0.13 Non significant
0.15 0.35 Medium 0.13 0.25 Significant
Larger than 0.35 Large Larger than 0.25 Practically important
2
With “non-significant” is meant that R for all practical
purposes does not differ from zero. "Significant" means
deviation from zero, while "practically important” means
2
that R not only differs from zero, but is also large enough
so that a linear relation exist between x and y that is of
practical importance.
CONCLUSION
The practical significance of results is not only important
when the results of population data are reported but also to
comment on the practical significance of a statistical
significant result in the case of random samples from
populations.
REFERENCES:
Cohen, Jacob 1988. Statistical power analysis for
behavioural sciences. Second edition. Hillsdale, NJ:
Erlbaum.
Rosenthall, R. 1991. Meta-analytic procedures for social
research. Newbury Park: Calif. Sage Publications.
Steyn, H.S. (jr.). 1999. Praktiese Beduidendheid: Die
gebruik van Effekgroottes. Wetenskaplike Bydraes,
Reeks B: Natuurwetenskappe nr. 117.
Publikasiebeheerkomitee, PU vir CHO, Potchefstroom.
Steyn, H.S. (jr.). 2000. Practical significance of the
difference in means, Journal of Industrial Psychology,
26(3), 1-3.
Steyn, H.S. (jr.). 2002. Practically significant relationships
between two variables, SA Journal of Industrial
Psychology, 28(3), 10-15.
53
2
2
2.
1
R
fR