ArticlePDF Available

Practical significance (effect sizes) versus or in combination with statistical significance (p-values)

Authors:
ABSTRACT
Statistical significance tests have a tendency to yield
small p-values (indicating significance) as the size of the
data sets increase. The effect size is independent of
sample size and is a measure of practical significance. It
can be understood as a large enough effect to be
important in practice and is described for differences in
means as well as for the relationship in two-way
frequenc y tables and also for a multiple regression fit.
INTRODUCTION
An advantage of drawing a random sample is that it enables one
to study the properties of a population with the time and money
available. In such cases the statistical significance tests (eg. t-
tests) are used to show that the result (eg. difference between two
means) are significant. The p-value is a criterion of this, giving
the probability that the obtained value or larger could be obtained
under the assumption that the null hypothesis (eg. no difference
between the means) is true. A small p-value (eg. smaller than
0.05) is considered as sufficient evidence that the result is
statistically significant. Statistical significance does not
necessarily imply that the result is important in practice as these
tests have a tendency to yield small p-values (indicating
significance) as the size of the data sets increase.
In many cases researchers are forced to consider their
obtained results as a subpopulation of the target population
due to the weak response of the planned random sample.
In other cases data obtained from convenience sampling
are erroneously analysed as if it were obtained by random
sampling. These data should be considered as small
populations for which statistical inference and p-values are
not relevant. Statistical inference draws conclusions about
the population from which a random sample was drawn,
using the descriptive measures that have been calculated.
Instead of only reporting descriptive statistics in these
cases, effect sizes can be determined. Practical
significance can be understood as a large enough
difference to have an effect in practice.
Many different effect sizes exist (see Rosenthall, 1991 and
Steyn, 1999) but here we will only discuss those most
frequently used, i.e. for the difference between means and for
relationships in two-way frequency (contingency) tables and
in multiple regression..
EFFECT SIZE FOR THE DIFFERENCE BETWEEN
MEANS
Consider the following example of testing the difference
in IQ's of two random samples of size 200 from different
populations. With mean standard deviation of
110 ± 10 and 107 ± 12 a test statistic of
with p=0.007 is obtained. It is apparent that the difference
in mean IQ's are statistically significant (p<0.05), but is the
difference between IQ's of 110 and 107 important enough
to be of practical significance? According to the IQ scale a
difference of 3 units is not important. We are interested in
finding a measure for practical significance analogous to
the test statistics z or t, which are being used to decide
whether a statistical significant difference between two
means holds.
Management Dynamics Volume 12 No. 4, 2003 51
S.M. Ellis
H.S. Steyn
Potchefstroom University for CHE
Practical significance (effect sizes) versus
or in combination with statistical
significance (p-values)
22
10 12
200 200
110 107 2.72z
RESEARCH NOTE
A natural way to comment on practical significance is to use
the standardised difference between the means of two
populations, i.e. the difference between the two means divided
by the estimate for standard deviation. We introduce a measure
that is called the effect size, which not only makes the
difference independent of units and sample size, but relates it
also with the spread of the data, see Steyn (1999) and Steyn
(2000). Table 1 gives the effect size in different situations.
Comments:
(a) is the difference between and without
taking the sign into consideration. Here the direction of
the difference is not important. If it is of import ance,
formulas (1) to (4) can be altered to
,
and .
(b) At formulas (3) and (4) the assumption is made of
equal population standard deviations, therefore , the
pooled value, is used in the denominator.
(c) At formula (1) the difference in means relative to the
control group's standard deviation is used, since in
such cases the control group is the point of departure.
When no control group exists, the division by s in
max
formula (2) gives rise to a conservative effect size in
the sense that a practically significant result will not
be concluded too easily.
Cohen (1988) gives the following guidelines for the
interpretation of the effect size in the current case:
(a) small effect: d=0.2, (b) medium effect: d=0.5 and (c)
large effect: d=0.8. (5)
We consider data with d 0.8 as practically significant,
since it is the result of a difference having a large effect.
The effect size for the difference in IQ's in the example is
indicating that the effect is not
practically significant.
EFFECT SIZE FOR THE RELATIONSHIP IN A
CONTINGENCY TABLE
In many cases it is important to know whether a
relationship between two variables are practically
significant, eg. between gender and preference for or
against a new medical scheme for workers. For random
samples, the statistically significance of such relationships
are determined with Chi-square tests, but actually one
wants to know whether the relationship is large enough to
be important.
2
In this case the effect size is given by where X is
the usual Chi-square statistic for the contingency table and
n is the sample size, see Steyn (1999) and Steyn (2002). In
the special case of a 2 x 2 table, the effect size(w) is given
by the phi( ) coefficient. Note that the effect size is again
52
TABLE 1
EFFECT SIZES FOR MEANS
Test Conditions Effect size
z or t
xE
: experimental
xs
KK
,
: control
dxx
s
EK
K
(1)
z or t
1
and
2
not necessarily equal.
Take
smax
maximum of
s1
and
s2
dxx
s
12
max
(2)
t
12:
s
the pooled standard deviation
dxx
s
12
(3)
ANOVA
for all , :
ijij
MSE the mean square error of analysis
of variance
ij
xx
dMSE
(4)
12
xx
1
x
2
x
/,
EKK
dxxs
12/dxxs
ij
xx
dMSE
110 107 0.25,
12
d
2,
X
n
w
independent of the sample size. Cohen (1988) gives the
following guidelines for the interpretation of it in the
current case:
(a) small effect: w = 0.1, (b) medium effect: w = 0.3,
(c) large effect: w = 0.5. (6)
A relationship with w is considered as practically
significant.
EFFECT SIZE OF A MULTIPLE REGRESSION FIT
2
The coefficient of determination (R) is a measure of the
goodness-of-fit of the multiple regression fit, with
2
0 R 1. It can be interpreted as the proportion of
variation in the response variable explained by (or
attributed to) the fitted model.
2
The question is how large R should be to be significantly
greater than zero or to be of practical importance. The usual
2
F-test is used to decide whether R is statistically significant.
However, this does not necessarily imply a good fit for the
multiple regression. The effect size gives such a measure.
This effect size is calculated as the proportion of the
variation accounted for by the regression line
relative to the proportion not accounted for
2
Cohen (1988) suggested the following guidelines for f .
2
For the value f = 0.02 a small effect is established, which
2
means that R is approximately also 0.02. This means that
only 2% of the criterion variance is explained. Further
2
f is taken as a medium effect, because a value of
2
0.13 for R, explaining 13% of the criterion variance, gives
22
a f -value of 0.15. Finally f = 0.35 can be taken as a large
2
effect, which means that R is roughly 0.25. Here one
quarter of the proportion criterion variance is due to the
regression. In the light of the above-mentioned reasoning,
we can agree upon the outline in Table 2.
TABLE 2
CONCLUSIONS FROM EFFECT SIZES
222
Effect size (f ) Effect Values of RConclusions on R
Smaller than 0.15 Small Smaller than 0.13 Non significant
0.15 0.35 Medium 0.13 0.25 Significant
Larger than 0.35 Large Larger than 0.25 Practically important
2
With “non-significant” is meant that R for all practical
purposes does not differ from zero. "Significant" means
deviation from zero, while "practically important” means
2
that R not only differs from zero, but is also large enough
so that a linear relation exist between x and y that is of
practical importance.
CONCLUSION
The practical significance of results is not only important
when the results of population data are reported but also to
comment on the practical significance of a statistical
significant result in the case of random samples from
populations.
REFERENCES:
Cohen, Jacob 1988. Statistical power analysis for
behavioural sciences. Second edition. Hillsdale, NJ:
Erlbaum.
Rosenthall, R. 1991. Meta-analytic procedures for social
research. Newbury Park: Calif. Sage Publications.
Steyn, H.S. (jr.). 1999. Praktiese Beduidendheid: Die
gebruik van Effekgroottes. Wetenskaplike Bydraes,
Reeks B: Natuurwetenskappe nr. 117.
Publikasiebeheerkomitee, PU vir CHO, Potchefstroom.
Steyn, H.S. (jr.). 2000. Practical significance of the
difference in means, Journal of Industrial Psychology,
26(3), 1-3.
Steyn, H.S. (jr.). 2002. Practically significant relationships
between two variables, SA Journal of Industrial
Psychology, 28(3), 10-15.
53
2
2
2.
1
R
fR
... Descriptive statistics were performed for all sections of the questionnaire, using frequencies and means. Spearman's correlation coefficients were computed, where r-values of 0.1 indicate small, 0.3 medium and 0.5 large effect sizes (Ellis and Steyn, 2003). To determine statistically significant differences ANOVAs were conducted. ...
... To determine statistically significant differences ANOVAs were conducted. Considering the method for sample selection, practical significance was determined by computing Cohen's d-value as a measure of effect size to indicate the relevance of statistically significant differences in practice, where d 5 0.2 indicates a small effect, d 5 0.5 a medium effect (tendencies) and d 5 0.8 a large (practically significant) effect (Ellis and Steyn, 2003). For the purpose of this study, r-values 0.3 and d 0.5 were reported and considered as of medium effect. ...
Article
Full-text available
Low-literate consumers experience various challenges in the marketplace. This quantitative study focused on low-literate female consumers' use of clothing labels amidst personal and product-related challenges in the marketplace. An interviewer-administered questionnaire was used among black African female consumers (n 5 450) with literacy levels ranging from Grades 5 to 8 (on average, 11-14 years old), residing in the Emfuleni Local Municipality area, Gauteng, South Africa. Personal challenges experienced involved reading and numeracy skills, as well as concrete and pictographic thinking. Although respondents indicated that they read and understand clothing labels, results revealed problems experienced when attempting to use the information provided. Their numeracy skills were average and related abstract thinking skills were fair. Product-related challenges were related to the format of labels, care-label knowledge and evaluation of clothing quality. Respondents' preference for symbolic and graphic presentation of size format provided evidence of pictographic thinking. Care label understanding was poor, and clothing products were evaluated concretely. Some of the older respondents were inclined to follow the peripheral route of elaboration when reading clothing-label information. We conclude that low-literate consumers' use of clothing label information can improve if provided in a format that they can read and understand.
... This second set of results was analysed according to ANOVA, the Welsch's independent t-test and Cohen's effect size (dvalues) to analyse and compare the responses from the generational cohorts. Cohen's (1988) guidelines for the interpretation of d-values were followed, namely: d=0.2 equates to a small effect signifying no practically significant difference, d=0.5 to a medium effect or practically visible difference and d=0.8 to a large effect or practically significant difference (Cohen 1988;Ellis & Steyn, 2003). Table 1 provides the reliability and validity statistics for each of the component results. ...
Conference Paper
Full-text available
Constant learning to adapt and evolve is a strategic imperative for organisations that wish to remain competitive. But do managers from different age groups agree on the importance of organizational learning? This study focused on agricultural businesses as learning organisations to determine whether different generational cohorts of managers would characterise their organisations as learning organisations. The results obtained were compared to determine trends and biases in this regard. The Learning Organization Survey was distributed among 200 managers from three different agricultural organisations in South Africa and 136 responses were received. The results showed that, overall, managers in agricultural organisations do not perceive their organisational units to be learning organisations. Although the magnitude of differences between age cohorts varied, it was furthermore clear that managers from different generational cohorts might be susceptible to biases affecting aspects of organizational learning and which may influence their strategies to drive innovation. This study brings new insights that can contribute towards the more effective management of food production as a strategic resource. Managers are sensitised to valuable information on how to become a learning organisation and how to develop strategies to maximise the involvement and contribution of managers from all the different generational cohorts.
... Focusing on a single tissue at a time, the control group was compared to the transport group, followed by comparisons of the control group to the recovery group. Univariate analyses (one-way ANOVA) were used to detect metabolites with statistical significance (Ellis and Steyn 2003) [p-value <0.05 (false discovery rate ≤ 0.1)], while the effect size was calculated to ensure practical significance [d-value >0.8, calculated by determining the absolute difference between the means of the two groups divided by the maximum standard deviation of the two groups (Venter et al. 2018b)]. Metabolites of importance are listed in the supplementary data. ...
Article
Full-text available
Abalone is a gourmet seafood with a high commercial value, particularly when obtained as a live product. During live transportation, abalone encounter stressors causing biochemical modifications to tolerate the changes. Using semi-targeted metabolomics, this study characterised the left and right gill metabolite profiles of Blackfoot abalone, Haliotis iris, following transportation (48 h) and recovery (48 h). This study reports the association between left and right gill metabolites, to enhance our physiological understanding of the interplay between gills. The left gill metabolites are mainly active following transportation, while both gills partake in the metabolite response following recovery. Transportation necessitated increased metabolites linked to the glycolysis pathway, the Krebs cycle, amino acid, and nucleotide metabolism, for energy production, achieved via aerobic and anaerobic pathways. The recovery phase supported the replenishing of glycogen, triglycerides, and protein stores, albeit metabolic homeostasis was not achieved following two-days of water immersion recovery. This study showcases the well-adapted metabolic mechanisms implemented by H. iris in response to transportation stress and show that metabolites are in the process of returning to the same concentrations as measured pre-transport stress. The findings herein can be applied to improve animal health during transport and subsequent survival, which in-effect supports profitability.
... Practical significance indicates whether the differences are large enough to have an effect in practice (Ellis and Steyn 2003). We calculated the practical significance (effect size) of the differences with reference to geography teachers with different years of teaching and their implementation of teacher-and learner-centered instructional strategies, as well as the challenges they experienced with the implementation of learner-centered instructional strategies, with Cohen's d-values, which is a standardized difference between the means of the different groups. ...
... A two-way ANOVA type Hierarchical Linear Model (HLM) was used to test the effects of rainfall year and rangeland type on variation in grass and forb diversity indices. Significant differences in response to rangeland type and rainfall year were tested using effect sizes (Cohen's d, Ellis and Steyn 2003). ...
Chapter
Full-text available
Savanna rangelands cover large areas of southern Africa. They provide ecosystem functions and services that are essential for the livelihoods of people. However, intense land use and climate change, particularly drought, threaten biodiversity and ecosystem functions of savanna rangelands. Understanding how these factors interact is essential to inform policymakers and to develop sustainable land-use strategies. We applied three different approaches to understand the impacts of drought and grazing on rangeland vegetation: observations, experimentation and modeling. Here, we summarize and compare the main results from these approaches. Specifically, we demonstrate that all approaches consistently show declines in biomass and productivity in response to drought periods, as well as changes in community composition toward annual grasses and forbs. Vegetation recovered after drought periods, indicating vegetation resilience. However, model extrapolation until 2030 showed that vegetation attributes such as biomass and community composition did not recover to values simulated under no-drought conditions during a ten-year period following the drought. We provide policy-relevant recommendations for rangeland management derived from the three approaches. Most importantly, vegetation has a high potential to regenerate and recover during resting periods after disturbance.
... In a study of relationships between variables, we can often distinguish between dependent variables and independent variables. The variables in a study of cause-and-effect relationships are called independent and dependent variablesthe independent variable is the cause, and the dependent variable is the effect (Ellis & Steyn, 2003;Maree, 2011, 147). In this study, the intercultural communication factors displayed in Table 1 were studied under the supposition that they depend on the biographical variables of the respondents. ...
Article
Full-text available
Effective intercultural communication is one possible way of improving intercultural competency and ultimately assisting an organisation to achieve success. Employees from different cultural backgrounds are required to adapt and accommodate each other in intercultural communication interactions. Therefore, this article sought to establish the effects of biographical variables on employees at a university of technology. An online self-administered questionnaire was used to collect quantitative data. A total of 294 respondents completed the questionnaire. Likert items were used to obtain responses to the questions, and SPSS software was used to analyse the data. The findings showed that biographical variables such as education, age, language, race, and country of origin do not influence the adaptation to intercultural communication, competence in intercultural communication, the impact of culture on communication, recognition, and understanding of cultural differences, intercultural communication as a source of team cohesion and improvement of intercultural communication. Additionally, the findings showed that women and men perceive the above intercultural communication variables in the same way. The article advocates for the continuous enhancement of intercultural communication.
... As suggested by Ellis and Steyn (2003), because of the lack of generalisability, effect sizes instead of p-values will be used for interpretation purposes, but the p-values will be reported for the sake of completeness. This information is reflected in Table 4 below. ...
Article
Full-text available
Forensic social work in South Africa is challenging, increasing the likelihood of secondary traumatic stress among its practitioners. Proactive coping strategies are necessary to reduce the impact of secondary traumatic stress on forensic social workers. The aim of this study was to describe the association between the frequency of different coping strategies and the frequency of secondary traumatic stress symptoms in forensic social workers. The study applied a quantitative approach with a cross-sectional descriptive design. An all-inclusive willing participation sampling method was used, focusing on all qualified forensic social workers who graduated from a South African university. The study found that avoidant and emotion-focused coping strategies were linked to an increase in secondary traumatic stress symptoms. Employers and therapists can support forensic social workers by offering techniques that encourage the use of problem-focused coping strategies, aiming to alleviate symptoms of perceived secondary traumatic stress. Keywords: coping, forensic social work, secondary traumatic stress, trauma symptoms, symptoms
Article
Forensic social work poses obstacles that can increase the risk of burnout among practitioners. Prior research studies have explored burnout among generalist social workers; however, there is a noticeable deficiency in the literature regarding burnout among forensic social workers, as well as the absence of comparisons between these two professions to discern variations in burnout patterns. The aim of this study was to compare burnout patterns among generalist and forensic social workers using a quantitative approach and a comparison group design. Qualified generalist and forensic social workers from an accredited South African university participated in an all-inclusive willing participation sample approach. The study found that, while both groups had moderate degrees of emotional exhaustion, forensic social workers experienced higher levels. There were also significant disparities in depersonalisation, with forensic social workers suggesting higher levels. This study contributes to understanding burnout patterns in forensic social workers, highlighting the need for targeted efforts to mitigate their effects.
Article
Full-text available
The 21st century requires students to develop information education and thinking skills, including creative and critical thinking and problem-solving skills. However, the current classroom culture does not encourage self-directed learning. A study was conducted with Baccalaureus Educationist (BEd) pre-service teachers from the North-West University in South Africa to investigate the effect of Habits of Mind, Thinking Maps, and Six Thinking Hats on the development of self-directed learning skills. A non-probability sampling method was employed to select a convenient and purposeful sample of 277 students who volunteered and specialised in Business Studies, Economics, Accounting, and History. The participants completed the Self-Rating Scale of Self-Directed Learning (SRSSDL) of Williamson's questionnaire to evaluate participants' self-direction in learning. A quantitative-descriptive survey and qualitative phenomenological research were used to triangulate the data. The study found that the self-directed learning awareness of first- and fourth-year student teachers marginally increased. The proposed model fit the data well; all variables contributed statistically and practically significant correlations. Quantitative analytical tools were employed to determine the reliability of all constructs by calculating the Cronbach Alfa values. The study also found that self-directed learning enhanced academic achievement in first- and fourth-year students. A Confirmatory Factor Analysis (CFA), using Structural Equation Modeling (SEM), was conducted on the pre-test data (n = 277) of this study to assess how well the proposed model fits the data. All the variables contribute statistical significance to the model (p-values of all the regression coefficients < 0.05), and all the correlations between the constructs are practically significant. Improvements in SRSSDL scores were investigated using dependent t-tests. The participants valued teaching strategies that accommodate varied learning requirements and provide flexibility and autonomy. The study highlights the importance of a nurturing learning environment that challenges and fosters individual growth. It also emphasises that self-directed learning does not occur in isolation but thrives in environments that offer coherence and independence. Overall, the study contributes to the understanding of self-directed learning and provides insights for teachers on promoting and supporting this skill among pre-service teachers.
Article
Full-text available
It is shown how the standardised difference (the effect size) between two population means can be used to establish significance when the populations are observed in totality. When dealing with two samples methods are given to determine the practical importance of a statistically significant difference. The usual effect size formula is adapted to deal with cases where populations have different standard deviations. Opsomming Dit word aangetoon hoe die gestandaardiseerde verskil (die effekgrootte) tussen twee populasiegemiddeldes gebruik kan word om beduidenheid t.o.v. volledig waargenome populasies te bepaal. In die geval van twee steekproewe word metodes gegee om die praktiese belangrikheid van 'n statistiese beduidende verskil vas te stel. Die gewone effekgrootte formule word aangepas ten einde gevalle waar populasies verskillende standaardafwykings het te hanteer.
Article
Full-text available
It is shown how effect sizes can be used to establish whether relationships between two variables are practically significant (important). This is done for populations as well as for samples. Four cases are distinguished: When both variables are nominal, both dichotomous, one dichotomous and the other on an interval scale and lastly both variables on an interval scale. Examples are given to illustrate the use of the suggested effect sizes. Opsomming Daar word aangetoon hoe effekgroottes gebruik kan word om te bepaal of verbande tussen twee veranderlikes prakties betekenisvol (belangrik) is. Dit word vir populasies sowel as vir steekproewe gedoen. Vier gevalle word onderskei: Wanneer albei veranderlikes nominaal, albei digotoom, een digotoom en die ander op ‘n intervalskaal en laastens albei veranderlikes op ‘n intervalskaal is. Voorbeelde word gegee om die gebruik van die voorgestelde effekgroottes te illustreer.
Praktiese Beduidendheid: Die gebruik van Effekgroottes. Wetenskaplike Bydraes, Reeks B: Natuurwetenskappe nr
  • H S Steyn
Steyn, H.S. (jr.). 1999. Praktiese Beduidendheid: Die gebruik van Effekgroottes. Wetenskaplike Bydraes, Reeks B: Natuurwetenskappe nr. 117.