Analysis Of Variance Pdf Free Download
Analysis of variance (ANOVA) is a statistical test for detecting differences in group means when there is one parametric dependent variable and one or more independent variables. This article summarizes the fundamentals of ANOVA for an intended benefit of the clinician reader of scientific literature who does not possess expertise in statistics. The emphasis is on conceptually-based perspectives regarding the use and interpretation of ANOVA, with minimal coverage of the mathematical foundations. Computational examples are provided. Assumptions underlying ANOVA include parametric data measures, normally distributed data, similar group variances, and independence of subjects. However, normality and variance assumptions can often be violated with impunity if sample sizes are sufficiently large and there are equal numbers of subjects in each group. A statistically significant ANOVA is typically followed up with a multiple comparison procedure to identify which group means differ from each other. The article concludes with a discussion of effect size and the important distinction between statistical significance and clinical significance.
Figures - uploaded by Steven Sawyer
Author content
All figure content in this area was uploaded by Steven Sawyer
Content may be subject to copyright.
Discover the world's research
- 20+ million members
- 135+ million publications
- 700k+ research projects
Join for free
THE JOURNAL OF MANUAL & MANIPULATIVE THERAPY n VOLUM E 17 n NUM BER 2 E 27
Department of Rehabilitation Sciences, School of Allied Health Sciences,
Texas Tech University Health Sciences Center, Lubbock, TX
Address all correspondence and requests for reprints to: Steven F. Sawyer, PT, PhD, steven.sawyer@ttuhsc.edu
Analysis of variance (ANOVA) is a
statistical tool used to detect dier-
ences between experimental group
means. ANOVA is warranted in experi-
mental designs with one dependent vari-
able that is a continuous parametric nu-
merical outcome measure, and multiple
experimental groups within one or more
independent (categorical) variables. In
ANOVA terminology, independent vari-
ables are called factors, and groups within
each factor are referred to as levels. e
array of terms that are part and parcel of
ANOVA can be intimidating to the un-
initiated, such as: partitioning of vari-
ance, main eects, interactions, factors,
sum of squares, mean squares, F scores,
familywise alpha, multiple comparison
procedures (or post hoc tests), eect size,
statistical power, etc. How do these terms
pertain to p values and statistical signi-
cance? What precisely is meant by a "sta-
tistically signicant ANOVA"? How does
analyzing variance result in an inferential
decision about dierences in group
means? Can ANOVA be performed on
non-parametric data? What are the vir-
tues and potential pitfalls of ANOVA?
ese are the issues to be addressed in
this primer on the use and interpretation
of ANOVA. e intent is to provide the
clinician reader, whose misspent youth
did not include an enthusiastic reading of
statistics textbooks, an understanding of
the fundamentals of this widely used
form of inferential statistical analysis.
ANOVA General Linear Models
ANOVA is based mathematically on lin-
ear regression and general linear models
that quantify the relationship between
the dependent variable and the indepen-
dent variable(s)1. ere are three dierent
general linear models for ANOVA: (i)
Fixed eects model (Model 1) makes infer-
ences that are specic and valid only to
the populations and treatments of the
study. For example, if three treatments
involve three dierent doses of a drug,
inferential conclusions can only be drawn
for those specic drug doses. e levels
within each factor are xed as dened by
the experimental design. (ii) Random ef-
fects model (Model 2) makes inferences
about levels of the factor that are not used
in the study, such as a continuum of drug
doses when the study only used three
doses. is model pertains to random ef-
fects within levels, and makes inferences
about a population's random variation.
(iii) Mixed eects model (Model 3) con-
tains both Fixed and Random eects.
In most types of orthopedic reha-
bilitation clinical research, the Fixed ef-
fects model is relevant since the statistical
inferences being sought are xed to the
levels of the experimental design. For this
reason, the Fixed eects model will be the
focus of this article. Computer statistics
programs typically default to the Fixed
eects model for ANOVA analysis, but
higher end programs can perform
ANOVA with all three models.
ABST RACT: Analysis of variance (ANOVA) is a statistical test for detecting dierences
in group means when there is one parametric dependent variable and one or more indepen-
dent variables. is article summarizes the fundamentals of ANOVA for an intended benet
of the clinician reader of scientic literature who does not possess expertise in statistics. e
emphasis is on conceptually-based perspectives regarding the use and interpretation of
ANOVA, with minimal coverage of the mathematical foundations. Computational exam-
ples are provided. Assumptions underlying ANOVA include parametric data measures,
normally distributed data, similar group variances, and independence of subjects. However,
normality and variance assumptions can oen be violated with impunity if sample sizes are
suciently large and there are equal numbers of subjects in each group. A statistically sig-
nicant ANOVA is typically followed up with a multiple comparison procedure to identify
which group means dier from each other. e article concludes with a discussion of eect
size and the important distinction between statistical signicance and clinical signicance.
KEYWORDS: Analysis of Variance, Interaction, Main Eects, Multiple Comparison
Procedures
Analysis of Variance: The Fundamental Concepts
PT, PhD
[E28] THE JOURNAL OF MANUAL & MANIPULATIVE THERAPY n VOLUME 17 n NUMBER 2
An Al ys is o F VAr iAn Ce : Th e Fu nd Amen TAl Co n Cep Ts
Assumptions of ANOVA
Assumptions for ANOVA pertain to the
underlying mathematics of general lin-
ear models. Specically, a data set should
meet the following criteria before being
subjected to ANOVA:
Parametric data: A parametric
ANOVA, the topic of this article, re-
quires parametric data (ratio or interval
measures). ere are non-parametric,
one-factor versions of ANOVA for non-
parametric ordinal (ranked) data, spe-
cically the Kruskal-Wallis test for inde-
pendent groups and the Friedman test
for repeated measures analysis.
Normally distributed data within
each group: ANOVA can be thought of
as a way to infer whether the normal dis-
tribution curves of dierent data sets are
best thought of as being from the same
population or dierent populations
(Figure 1). It follows that a fundamental
assumption of parametric ANOVA is
that each group of data (each level) be
normally distributed. e Shapiro-Wilk
test2 is commonly used to test for nor-
mality for group sample sizes (N) less
than 50; D'Agnostino's modication3 is
useful for larger samplings (N>50).
A normal distribution curve can be
described by whether it has symmetry
about the mean and the appropriate
width and height (peakedness). ese
attributes are dened statistically by
"skewness" and "kurtosis", respectively.
A normal distribution curve will have
skewness = 0 and kurtosis = 3. (Note that
an alternative denition of kurtosis sub-
tracts 3 from the nal value so that a
normal distribution will have kurtosis =
0. is "minus 3" kurtosis value is some-
times referred to as "excess kurtosis" to
distinguish it from the value obtained
with the standard kurtosis function. e
kurtosis value calculated by many statis-
tical programs is the "minus 3" variant
but is referred to, somewhat mislead-
ingly, as "kurtosis."). Normality of a data
set can be assessed with a z-test in refer-
ence to the standard error of skewness
(estimated as √[6 / N) and the standard
error of kurtosis (estimated as √[24 /
N)4 . A conservative alpha of 0.01 ( z ≥
FIGURE 1. Graphical representation of statistical Null and Alternative hypotheses for ANOVA in the case of one dependent
variable (change in ankle ROM pre/post manual therapy treatment, in units of degrees), and one independent variable with
three levels (three dierent types of manual therapy treatments). For this ctitious data, the group (sample) means are 13, 14 and
18 degrees of increased ankle ROM for treatment type groups 1, 2 and 3, respectively (raw data are presented in Figure 2). e
Null hypothesis is represented in the le graph, in which the population means for all three groups are assumed be identical
to each other (in spite of dierence in sample means calculated from the experimental data). Since in the Null hypothesis the
subjects in the three groups are considered to compose a single population, by denition the population means of each group are
equal to each other, and are equal to the Grand mean (mean for all data scores in the three groups). e corresponding normal
distribution curves are identical and precisely overlap along the X-axis. e Alternative hypothesis is shown in right graph, in
which dierences in group sample means are inferred to represent true dierences in group population means. ese normal
distribution curves do not overlap along the X-axis because each group of subjects are considered to be distinct populations with
respect to ankle ROM, created from the original single population that experienced dierent ecacies of the three treatments.
Graph is patterned aer Wilkinson et al11.
probability density Function
probability density Function
Null hypothesis:
Identical normal distribution curves
Alternative hypothesis:
Different normal distribution curves
increased ankle r om (degree) increased ankle r om (degree)
THE JOURNAL OF MANUAL & MANIPULATIVE THERAPY n VOLUME 17 n NUM BER 2 [E29]
An Al ys is o F VAr iAn Ce : Th e Fu nd Amen TAl Co n Cep Ts
2.56) is appropriate, due to the overly
sensitive nature of these tests, especially
for large sample sizes (>100)4. As a com-
putational example, for N = 20, the esti-
mation of standard error of skewness =
√[6 / 20] = 0.55, and any skewness value
greater than ±2.56 x 0.55 = ±1.41 would
indicate non-normality. Perhaps the
best "test" is what always should be
done: examine a histogram of the distri-
bution of the data. In practice, any dis-
tribution that resembles a bell-shaped
curve will be "normal enough" to pass
normality tests, especially if the sample
size is adequate.
Homogeneity of variance within
each group: Referring again to the notion
that ANOVA compares normal distri-
bution curves of data sets, these curves
need to be similar to each other in shape
and width for the comparison to be
valid. In other words, the amount of data
dispersion (variance) needs to be similar
between groups. Two commonly in-
voked tests of homogeneity of variance
are by Levene5 and Brown & Forsthye6.
Independent Observations: A gen-
eral assumption of parametric analysis is
that the value of each observation for
each subject is independent of (i.e., not
related to or inuenced by) the value of
any other observation. For independent
groups designs, this issue is addressed
with random sampling, random assign-
ment to groups, and experimental con-
trol of extraneous variables. is as-
sumption is an inherent concern for
repeated measures designs, in which an
assumption of sphericity comes into
play. When subjects are exposed to all
levels of an independent variable (e.g.,
all treatments), it is conceivable that the
eects of a treatment can persist and af-
fect the response to subsequent treat-
ments. For example, if a treatment eect
for one level has a long half-time (analo-
gous to a drug eect) and there is inad-
equate "wash out" time between expo-
sure to dierent levels (treatments),
there will be a carryover eect. A well
designed and executed cross-over ex-
perimental design can mitigate carry-
over eects. Mauchly's test of sphericity
is commonly employed to test the as-
sumption of independence in repeated
measures designs. If the Mauchly test is
statistically signicant, corrections to
the F score calculation are warranted.
e two most commonly used correc-
tion methods are the Greenhouse-
Geisser and Huynh-Feldt, which calcu-
late a descriptive statistic called epsilon,
which is a measure of the extent to which
sphericity has been violated. e range
of values for epsilon are 1 (no sphericity
violation) to a lower boundary of 1 /
(m —1), where m = number of levels. For
example, with three groups, the range
would be 1 to 0.50. e closer epsilon is
to the lower boundary, the greater the
degree of violation. ere are three op-
tions for adjusting the ANOVA to ac-
count for the sphericity violation, all of
which involve modifying degrees of
freedom: use the lower boundary epsi-
lon, which is the most conservative ap-
proach (least powerful) and will gener-
ate the largest p value, or use either the
Greenhouse-Geisser epsilon or the
Huynh-Feldt epsilon (most powerful)
[statistical power is the ability of an in-
ferential test to detect a dierence that
actually exists, i.e., a true positive].
Most commercially available statis-
tics programs perform normality, ho-
mogeneity of variance and sphericity
tests. Determination of the parametric
nature of the data and soundness of the
experimental design is the responsibility
of the investigator, reviewers and critical
readers of the literature.
Robustness of ANOVA to Violations of
Normality and Variance Assumptions
ANOVA tests can handle moderate vio-
lations of normality and equal variance
if there is a large enough sample size and
a balanced design7. As per the central
limit theorem, the distribution of sam-
ple means approximates normality even
with population distributions that are
grossly skewed and non-normal, so long
as the sample size of each group is large
enough. ere is no xed denition of
"large enough", but a rule of thumb is
N≥308 . us, the mathematical validity
of ANOVA is said to be "robust" in the
face of violations of normality assump-
tions if there is an adequate sample size.
ANOVA is more sensitive to violations
of the homogeneity of variance assump-
tion, but this is mitigated if sample sizes
of factors and levels are equal or nearly
so9,10 . If normality and homogeneity of
variance violations are problematic,
there are three options: (i) Mathemati-
cally transform (log, arcsin, etc.) the
data to best mitigate the violation, with
the cost of cognitive fog in understand-
ing the meaning to the ANOVA results
(e.g., "A statistically signicant main ef-
fect was obtained for the arcsin transfor-
mation of degrees of ankle range of mo-
tion"). (ii) Use one of the non-parametric
ANOVAs mentioned above, but at the
cost of reduced power and being limited
to one-factor analysis. (iii) Identify out-
liers in the data set using formal statisti-
cal criteria (not discussed here). Use
caution in deleting outliers from the
data set; such decisions need to be justi-
ed and explained in research reports.
Removal of outliers will reduce devia-
tions from normality and homogeneity
of variance.
If You Understand t-Tests, You Already
Know A Lot About ANOVA
As a starting point, the reader should
understand that the familiar t-test is an
ANOVA in abbreviated form. A t-test is
used to infer on statistical grounds
whether there are dierences between
group means for an experimental design
with (i) one parametric dependent vari-
able and (ii) one independent variable
with two levels, i.e., there is one outcome
measure and two groups. In clinical re-
search, levels oen correspond to dier-
ent treatment groups; the term "level"
does not imply any ordering of the
groups.
e Null statistical hypothesis for a
t-test is H0: 1 = 2, that is, the population
means of the two groups are the same.
Note that we are dealing with popula-
tion means, which are almost always
unknown and unknowable in clinical
research. If the Null hypothesis involved
sample means, there would be nothing
to infer, since descriptive analysis pro-
vides this information. However, with
inferential analysis using t-tests and
ANOVA, the aim is to infer, without ac-
cess to "the truth", if the group popula-
tion means dier from each other.
e Alternative hypothesis, which
comes into play if the Null hypothesis is
rejected, asserts that the group popula-
[E30] THE JOURNAL OF MANUAL & MANIPULATIVE THERAPY n VOLUME 17 n NUMBER 2
An Al ys is o F VAr iAn Ce : Th e Fu nd Amen TAl Co n Cep Ts
tion means dier. e Null hypothesis is
rejected when the p value yielded by the
t-test is less than alpha. Alpha is the pre-
determined upper limit risk for commit-
ting a Type 1 error, which is the statisti-
cal false positive of incorrectly rejecting
the Null hypothesis and inferring the
groups means dier when in fact the
groups are from a single population. By
convention, alpha is typically set to 0.05.
e p value generated by the t-test statis-
tic is based on numerical analysis of the
experimental data, and represents the
probability of committing a Type 1 error
if the Null hypothesis is rejected. When
p is less than alpha, there is a statistically
signicant result, i.e., the values in the
two groups are inferred to dier from
each other and to represent separate
populations. e logic of statistical in-
ference is analogous to a jury trial: at the
outset of the trial (inferential analysis),
the group data are presumed to be in-
nocent of having dierent population
means (Null hypothesis) unless the dif-
ferences in group means in the sampled
data are suciently compelling to meet
the standard of "beyond a reasonable
doubt" (p less than alpha), in which case
a guilty verdict is rendered (reject Null
hypothesis and accept Alternative hy-
pothesis = statistical signicance).
e test statistic for a t-test is the t
score. In conceptual terms, the calcula-
tion of a t score for independent groups
(i.e., not repeated measures) is as fol-
lows:
t = statistical signal / statistical noise
t = treatment eect / unexplained vari-
ance ("error variance")
t = dierences between sample means
of the two groups / within-group
variance
e dierence in group means repre-
sents the statistical signal since it is pre-
sumed to result from treatment eects of
the dierent levels of the independent
variable. e within-group variance is
considered to be statistical noise and an
"error" term because it is not explained
by the inuence of the independent vari-
able on the dependent variable. e par-
ticulars of how the t score is calculated
depends on the experimental design (in-
dependent groups vs repeated measures)
and whether variance between groups is
equivalent; the reader is to referred to
any number of statistics books for details
about the formulae. e t score is con-
verted into a p value based on the magni-
tude of the t score (larger t scores lead to
smaller p values) and the sample size
(which relates to degrees of freedom).
ANOVA Null Hypothesis
and Alternative Hypothesis
ANOVA is applicable when the aim is to
infer dierences in group values when
there is one dependent variable and
more than two groups, such as one inde-
pendent variable with three or more lev-
els, or when there are two or more inde-
pendent variables. Since an independent
variable is called a "factor", ANOVAs are
described in terms of the number of fac-
tors; if there are two independent vari-
ables, it is a two-factor ANOVA. In the
simpler case of a one-factor ANOVA,
the Null hypothesis asserts that the pop-
ulation means for each level (group) of
the independent variable are equal. Let's
use as an example a ctitious experi-
ment with one dependent variable (pre/
post changes in ankle range of motion in
subjects who received one of three types
of manual therapy treatment aer surgi-
cal repair of a talus fracture). is con-
stitutes a one-factor ANOVA with three
levels (the three dierent types of treat-
ment). e Null hypothesis is: H0: 1 = 2
= 3. e Alternative hypothesis is that at
least two of group means dier. Figure 1
provides a graphical presentation of this
ANOVA statistical hypotheses: (i) the
Null hypothesis (le graph) asserts that
the normal distribution curves of data
for the three groups are identical in
shape and position and therefore pre-
cisely overlap, whereas (ii) the Alterna-
tive hypothesis (right graph) asserts that
these normal distribution curves are
best described by the distribution indi-
cated by the sample means, which repre-
sent an experimentally-derived estimate
of the population means11.
The Mechanics of Calculating a
One-factor ANOVA
ANOVA evaluates dierences in group
means in a round-about fashion, and
involves the "partitioning of variance"
from calculations of "Sum of Squares"
and "Mean Squares." ree metrics are
used in calculating the ANOVA test sta-
tistic, which is called the F score (named
aer R.A. Fisher, the developer of
ANOVA): (i) Grand Mean, which is the
mean of all scores in all groups; (ii) Sum
of Squares, which are of two kinds, the
sum of all squared dierences between
group means and the Grand Mean (be-
tween-groups Sum of Squares) and the
sum of squared dierences between in-
dividual data scores and their respective
group mean (within-groups Sum of
Squares), and (iii) Mean Squares, also of
two kinds (between-groups Mean
Squares, within-groups Mean Squares),
which are the average deviations of indi-
vidual scores from their respective
mean, calculated by dividing Sum of
Squares by their appropriate degrees of
freedom.
A key point to appreciate about
ANOVA is that the data set variance is
partitioned into statistical signal and
statistical noise components to generate
the F score. e F score for independent
groups is calculated as:
F = statistical signal / statistical noise
F = treatment eect / unexplained vari-
ance ("error variance")
F = Mean SquaresBetween Groups / Mean
SquaresWithin Groups (Error)
Note that the statistical signal, the MSBe-
tween Groups term, is an indirect measure of
dierences in group means. e MSWithin
Groups (Error) term is considered to represent
statistical noise/error since this variance
is not explained by the eect of the inde-
pendent variable on the dependent vari-
able. Here is the gist of the issue: as
group means increasingly diverge from
each other, there is increasingly more
variance for between-group scores in re-
lation to the Grand Mean, quantied as
Sum of SquaresBetween Groups, leading to a
larger MSBetween Groups term and a larger F
score. Conversely, as there is more vari-
ance within-group scores, quantied as
Sum of SquaresWithin Groups (Error), the
MSWithin Groups (Error) term will increase,
leading to a smaller F score. us, for
independent groups, large F scores arise
from large dierences between group
THE JOURNAL OF MANUAL & MANIPULATIVE THERAPY n VOLUME 17 n NUM BER 2 [E31]
An Al ys is o F VAr iAn Ce : Th e Fu nd Amen TAl Co n Cep Ts
means and/or small variances within
groups. Larger F scores equate to lower
p values, with the p value also inuenced
by the sample size and number of groups,
each of which constitutes separate types
of "degrees of freedom."
ANOVA calculations are now the
domain of computer soware, but there
is illustrative and heuristic value in man-
ually performing the arithmetic calcula-
tion of the F score to garner insight into
how analysis of data set variance gener-
ates a statistical inference about dier-
ences in group means. A numerical ex-
ample is provided in Figure 2, in which
the data set graphed in Figure 1 is listed
and subjected to ANOVA, yielding a cal-
culated F score and corresponding p
value.
Mathematical Equivalence of t-tests
and ANOVA: t-tests are a Special Case
of ANOVA
Let's briey return to the notion that a
t-test is a simplied version of ANOVA
that is specic to the case of one inde-
pendent variable with two groups. If we
analyze the data in Figure 2 for the Type
1 treatment vs. Type 3 treatment group
data (disregarding the Type 2 treatment
group data to reduce the analysis to two
groups), the t score for independent
groups is 5.0 with a p value of 0.0025
(calculations not shown). For the same
data assessed with ANOVA, the F score
is 25.0 with a p value of 0.0025. e t-test
and ANOVA generate identical p values.
e mathematical relation between the
two test statistics is: t 2 = F.
Repeated Measures ANOVA: Dierent
Error Term, Greater Statistical Power
e experimental designs emphasized
thus far entail independent groups, in
which each subject is "exposed" to only
one level of an independent variable. In
FIG URE 2. e mechanics of calculating a F score for a one-factor ANOVA with independent groups by partitioning the data
set variance as Sum of Squares and Mean Squares are shown below. is ctitious data set lists increased ankle range of motion
pre/post for three dierent types of manual therapy treatments. For the sake of clarity and ease of calculation, a data set with an
inappropriately small sample size is used.
Subject Manual Therapy Manual Therapy Manual Therapy
Gender treatment Type 1 treatment Type 2 treatment Type 3
Male 14 16 20
Male 14 14 18
Female 11 13 17
Female 13 13 17
Group Means 13 14 18
Grand Mean 15
In the following, SS = Sum of Squares; MS = Mean Squares; df = degrees of freedom.
SSTotal = SS Between Groups + SSWithin Groups (Error), and is calculated by summing the squares of dierences between each data value vs.
the Grand Mean. For this data set with a Grand Mean of 15:
SSTotal = (14-15 )2 + (14-15 )2 + (11-15 )2 + (13-15 )2 + (16-15 )2 + (14-15 )2 + (13-15 )2 + (13-15 )2 + (20-15 )2 + (18-15 )2 + (17-15 )2
+ (17-15 )2 = 74
SSWithin Groups (Error) = SSMT treatment Type 1 (Error) + SSMT treatment Type 2 (Error) + SSMT treatment Type 3 (Error), in which the sum of squares within each
group is calculated in reference to the group's mean:
SSMT treatment Type 1 (Error) = (14-13)2 + (14-13 )2 + (11-13 )2 + (13-13 )2 = 6
SSMT treatment Type 2 (Error) = (16-14)2 + (14-14 )2 + (13-14 )2 + (13-14 )2 = 6
SSMT treatment Type 3 (Error) = (20-18)2 + (18-18 )2 + (17-18 )2 + (17-18 )2 = 6
SSWithin Groups (Error) = 6 + 6 + 6 = 18. By subtraction, SSBetween Groups = 74 - 18 = 56
df refers to the number of independent measurements used in calculating a Sum of Squares.
dfBetween Groups = (# of groups—1) = (3—1) = 2
dfWithin Groups (Error) = (N —# of groups) = (12—1) = 9
ANOVA test statistic, the F score, is calculated from Mean Squares (SS/df ):
F = Mean SquaresBetween Groups / Mean SquaresWithin Groups (Error)
Mean SquaresBetween Groups = SSBetween Groups / df Between Groups = 56 / 2 = 28
Mean SquaresWithin Groups (Error) = SSWithin Groups (Error) / dfWithin Groups (Error) = 18 / 9 = 2
So, F = 28 / 2 = 14
With df Between Groups = 2 and df Within Groups (Error) = 9, this F score translates into p = 0.0017 , a statistically signicant result for
alpha = 0.05.
[E32] THE JOURNAL OF MANUAL & MANIPULATIVE THERAPY n VOLUME 17 n NUMBER 2
An Al ys is o F VAr iAn Ce : Th e Fu nd Amen TAl Co n Cep Ts
the data set of Figure 2, this would in-
volve each subject receiving only one of
the three dierent treatments. If a sub-
ject is exposed to all levels of an inde-
pendent variable, the mechanics of the
ANOVA are altered to take into account
that each subject serves as their own ex-
perimental control. Whereas the term
for statistical signal, MSBetween Groups, is un-
changed, there is a new statistical noise
term called MSWithin Subjects (Error) that per-
tains to variance within each subject
across all levels of the independent vari-
able instead of between all subjects
within one level. Since there is typically
less variation within subjects than be-
tween subjects, the statistical error term
is typically smaller in repeated measures
designs. A smaller MSWithin Subjects (Error)
value leads to a larger F value and a
smaller p value. As a result, repeated
measures ANOVA typically have greater
statistical power than independent
groups ANOVA.
Factorial ANOVA: Main Eects and
Interactions
An advantage of ANOVA is its ability to
analyze an experimental design with
multiple independent variables. When
an ANOVA has two or more indepen-
dent variables it is referred to as a facto-
r ial ANOVA, in contrast to the one-
factor ANOVAs discussed thus far. is
is ecient experimentally, because the
eects of multiple independent variables
on a dependent variable are tested on
one cohort of subjects. Furthermore,
factorial ANOVA permits, and requires,
an evaluation of whether there is an in-
terplay between dierent levels of the
independent variables, which is called
an interaction.
Denitions of terminology that is
unique to factorial ANOVA are war-
ranted: (i) Main eect is the eect of an
independent variable (a factor) on a de-
pendent variable, determined separate
from of the eects of other independent
variables. A main eect is a one-factor
ANOVA that is performed on a factor
that disregards the eects of other fac-
tors. In a two factor ANOVA, there are
two main eects, one for each indepen-
dent variable; a three-factor ANOVA
has three main eects, and so on. (ii) In-
teraction describes an interplay between
independent variables such that dier-
ent levels of the independent variables
have non-additive eects on the depen-
dent variable. In formal terms, there is
an interaction between two factors
when the dependent variable response
at levels of one factor dier from those
produced at levels of the other factor(s).
Interactions can be easily identied in
graphs of group means. For example,
again referring to the data set from Fig-
ure 2, let us now consider the eect of
subject gender as a second independent
variable. is would be a two factor
ANOVA: one factor is the sex of sub-
jects, called G, with two levels;
the second factor is the type of manual
therapy treatment, called T,
with three levels. A shorthand descrip-
tion of this design is 2x3 ANOVA (two
factors with two and three levels, re-
spectively). For this two-factor ANOVA,
there are three Null hypotheses: (i)
Main Eect for the G factor: Are
there dierences in the response (ankle
range of motion) for males vs. females
to manual therapy treatment (combin-
ing data for the three levels of the
T factor with respect to the
two G factor levels)? (ii) Main Ef-
fect for the T factor: Are
there dierences in the response for
subjects in the three levels of the T-
factor (combining data for males
and females in the G factor with
respect to the three T factor
levels)? (iii) Interaction: Are there dif-
ferences due to neither the G or
T factors alone but to the
combination of these factors? With re-
spect to analysis of interactions, Figure
3 shows a table of group means for all
levels the two independent variables,
based on data from Figure 2. Note that
the two independent variables are
graphed in relation to the dependent
variable. e two lines in the le graph
are parallel, indicating the absence of an
interaction between the levels of the two
factors. An interaction would exist if the
graphs were not parallel, such as in the
right graph in which group means for
males and females on the Type 2 treat-
ment were switched for illustrative pur-
poses. If the lines deviate from parallel
to a sucient degree, the interaction
will be statistically signicant. In this
case with two factors, there is only one
interaction to be evaluated. With three
or more independent variables, there
are multiple interactions that need to be
considered. A statistically signicant in-
teraction complicates the interpretation
of the Main Eects, since the factors are
not independent of each other in their
eects on the dependent variable. Inter-
actions should be examined before
Main Eects. If interactions are not sta-
tistically sig nicant, then Main Eects
can be easily evaluated as a series of
one-factor ANOVAs.
So There is a Statistically Signicant
ANOVA—Now What? Multiple
Comparison Procedures
If an ANOVA does not yield statistical
signicance on any main eects or inter-
actions, the Null hypothesis (hypothe-
ses) is (are) accepted, meaning that the
dierent levels of independent variables
did not have any dierential eects on
the dependent variable. e inferential
statistical work is done (but see next sec-
tion), unless confounding covariates are
suspected, possibly warranting analysis
of covariance (ANCOVA), which is be-
yond the scope of this article.
When statistical signicance is ob-
tained in an ANOVA, additional statisti-
cal tests are necessary to determine
which of the group means dier from
each other. ese follow-up tests are re-
ferred to as multiple comparison proce-
dures (MCPs) or post hoc tests. MCPs
involve multiple pair-wise comparisons
(or contrasts) in a fashion designed to
maintain alpha for the family of com-
parisons to a specied level, typically
0.05. is is referred to as the familywise
alpha. ere are two general options for
MCP tests: either perform multiple t-
tests that require "manual" adjustment
of the alpha for each pairwise test to
maintain a familywise alpha of 0.05, or
use a test such as the Tukey HSD (see
below) that has built-in protection from
alpha ination. Multiple t-tests have
their place, especially when only a sub-
set of all possible pairwise comparisons
are to be performed, but the special pur-
pose MCPs are preferable when all pair-
wise comparisons are assessed.
THE JOURNAL OF MANUAL & MANIPULATIVE THERAPY n VOLUME 17 n NUM BER 2 [E33]
An Al ys is o F VAr iAn Ce : Th e Fu nd Amen TAl Co n Cep Ts
Using the simple case of a statisti-
cally signicant one-factor ANOVA,
t-tests can be used for post hoc evalua-
tion with the aim of identifying which
levels dier from each. However, with
multiple t-tests there is a need to adjust
alpha for each t-test in such a way as to
maintain the familywise alpha at 0.05. If
all possible pairwise comparisons are
performed, there will be a geometric in-
crease in the number of t-tests as the
number of levels increases, as dened by
C = m (m - 1) / 2, where C = number of
pairwise comparisons, and m = number
of levels in a factor. For example, there
are three pairwise comparisons for three
levels; six comparisons for four levels;
ten comparisons for ve levels, and so
forth. ere is a need to maintain fami-
lywise alpha to 0.05 in these multiple
comparisons to maintain the risk of
Type 1 errors to no more than 5%. is
is commonly accomplished with the
Bonferroni (or Dunn) adjustment, in
which alpha for each post hoc t-test is
adjusted by dividing the familywise al-
pha (0.05) by the number of pairwise
comparisons:
αMultiple t-tests = αFamilywise / C
FIG URE 3. Factorial ANOVA interactions, which are assessed with a table and a graph of group means. Group means are based
on data presented in Figure 2, and represents a 3x2 two-factor (T x G) ANOVA with independent groups. In
reference to j columns and k rows indicated in the table below, the Null hypothesis for this interaction is:
j1,k1 – j1k2 = j2,k1– j2k2 = j3,k1– j3k2
e graph below le shows the group means of the two independent variables in relation to the dependent variable. e parallel
lines indicate that males and females displayed similar changes in ankle ROM for the three types of treatment, so there was no
interaction between the dierent levels of the independent variables. Consider the situation in which the group means for males
and females on treatment type 2 are reversed. ese altered group means are shown in the graph below right. e graphed lines
are not parallel, indicating the presence of an interaction. In other words, the relative ecacies of the three treatments are dierent
for males and females; whether this meets the statistical level of an interaction is determined by ANOVA (p less than alpha).
FACTOR A:
T
Treatment Treatment Treatment
FACTOR B: Type 1 Type 2 Type 3 Factor B Main
G (Level j = 1) (Level j = 2) (Level j = 3) Effect (row means)
Male
(Level k = 1) 14 15 19 16
Female
(Level k = 2) 12 13 17 14
Factor A Main
Effect (column
means) 13 14 18
Type 1 Type 2 Type 3 Type 1 Type 2 Type 3
Manual erapy Treatment Manual erapy Treatment
Increase in ankle ROM
Increase in ankle ROM
[E34] THE JOURNAL OF MANUAL & MANIPULATIVE THERAPY n VOLUME 17 n NUMBER 2
An Al ys is o F VAr iAn Ce : Th e Fu nd Amen TAl Co n Cep Ts
If there are two pairwise compari-
sons, alpha for each t-test is set to 0.05/2
= 0.25; for three comparisons, alpha is
0.05/3 = 0.0167, and so on. Any pairwise
t-test with a p value less than the ad-
justed alpha would be considered statis-
tically signicant. e trade-o for pre-
venting familywise alpha ination is that
as the number of comparisons increases,
it becomes incrementally more dicult
to attain statistical signicance due to
the lower alpha. Furthermore, the ina-
tion of familywise alpha with multiple
t-tests is not additive. As a result, Bon-
ferroni adjustments overcompensate the
alpha adjustment, making this the most
conservative (least powerful) of all
MCPs. For example, running two t-tests,
each with alpha set to 0.05, does not
double familywise alpha to 0.10; it in-
creases it to only 0.0975. e eects of
multiple t-tests on familywise alpha and
Type 1 error rate is dened by the fol-
lowing formula:
αFamilywise = 1—(1—αMultiple t-tests )C
e overcorrection by the Bonferroni
technique becomes more prominent
with many pairwise comparisons: exe-
cuting 20 t-tests, each with an alpha of
0.05, does not yield a familywise alpha of
20 x 0.05 = 1.00 (i.e., 100% chance of
Type 1 error); the value is actually 0.64.
ere are modications of the Bonfer-
roni adjustment developed by Šidák12 to
more accurately reect the ination of
familywise alpha that result in larger ad-
justed alpha levels and therefore in-
creased statistical power, but the eects
are slight and rarely convert a margin-
ally non-signicant pairwise compari-
son into a statistical signicance. For
example, with three pairwise compari-
sons, the Bonferroni adjusted alpha of
0.167 is increased by only 0.003 to 0.170
with the Šidák adjustment.
e sequential alpha adjustment
methods for multiple post hoc t-tests by
Holm13 and Hochberg14 provide in-
creased power while still maintaining
control of the familywise alpha. ese
techniques permit the assignment of sta-
tistical signicance in certain situations
for which p values are less than 0.05 but
do not meet the Bonferroni criterion for
signicance. e sequential approach by
Holm13 and Hochberg14 are called step-
down and step-up procedures, respec-
tively. In Hochberg's step-up procedure
with C pairwise comparisons, the t-test p
values are evaluated sequentially in de-
scending order, with p 1 the lowest value
and p C the highest. If pC is less than 0.05,
all the p values are statistically signi-
cant. If pC is greater than 0.05, that evalu-
ation is non-signicant, and the next
largest p value, p C - 1, is evaluated with a
Bonferroni adjusted alpha of 0.05/2 =
0.025. If p C - 1 is signicant, then all re-
maining p values are signicant. Each
sequential evaluation leads to an alpha
adjustment based on the number of pre-
vious evaluations, not on the entire set of
possible evaluations, thereby yielding
increased statistical power compared to
the Bonferroni method. For example, if
three p values are 0.07, 0.02 and 0.015,
Hochberg's method evaluates p 3 = 0.07
vs alpha = 0.05/1 = 0.05 (non-signi-
cant); then p 2 = 0.02 vs. alpha = 0.05/2 =
0.025 (signicant); and then p 1 = 0.015
vs alpha = 0.05/3 = 0.0167 (signicant).
Holm's method performs the inverse se-
quence and alpha adjustments, such that
the lowest p value is evaluated rst with a
fully adjusted alpha. In this case: p 1 =
0.015 vs alpha = 0.05/3 = 0.0167 (signi-
cant); then p2 = 0.020 vs. alpha = 0.05/2 =
0.025 (signicant); and then p 3 = 0.070
vs alpha = 0.05/1 = 0.05 (non-signi-
cant). Once Holm's method encounters
non-signicance, sequential evaluations
end, whereas Hochberg' method contin-
ues testing. For these three p values, the
Bonferroni adjustment would nd p =
0.015 signicant but p = 0.02 to be non-
signicant. As can be seen, the methods
of Hochberg and Holm are less conser-
vative and more powerful than Bonfer-
roni's adjustment. Further, Hochberg's
method is uniformly more powerful
than Holm's method15. For example, if
there are three pairwise comparisons
with p = 0.045, 0.04 and 0.03, all would
be signicant with Hochberg's method
but none would be with Holm's method
(or Bonferroni's).
ere are many types of MCPs dis-
tinct from the t-test approaches de-
scribed above16. ese tests have "built-
in" familywise alpha protection that do
not require "manual" adjustment of al-
pha. Most of these MCPs calculate a so-
called q value for each comparison that
takes into account group mean dier-
ences, group variances, and group sam-
ple sizes in a fashion similar but not
identical to the calculation of t. is q
value is compared to a critical value gen-
erated from a q distribution (a distribu-
tion of dierences in sample means).
Protection from familywise alpha ina-
tion is in the form of a multiplier applied
to the critical value. e multiplier in-
creases as the number of comparisons
increases, thereby requiring greater dif-
ferences between group means to attain
statistical signicance as the number of
comparisons is increased. Some MCPs
are better than others at balancing statis-
tical power and Type 1 errors. By general
consensus amongst statisticians, the
Fisher Least Signicant Dierence
(LSD) test and the Duncan's Multiple
Range Test are considered to be overly
powerful, with too high a likelihood of
Type 1 errors (false positives). e
Scheè test is considered to be overly
conservative, with too high a likelihood
of Type 2 errors (false negatives), but is
applicable when group sample sizes are
markedly unequal17. e Tukey Hon-
estly Signicant Dierence (HSD) is fa-
vored by many statisticians for its bal-
ance of statistical power and protection
from Type 1 errors. It is worth noting
that the power advantage of the Tukey
HSD test obtains only when all possible
pairwise comparisons are performed.
e Student-Newman-Keuls (SNK) test
statistic is computed identically to the
Tukey HSD, however the critical value is
determined dierently using a step-wise
approach, somewhat like the Holm
method described above for t-tests. is
makes the SNK test slightly more power-
ful than the Tukey HSD test. However,
an advantage of the Tukey HSD test is
that a variant called Tukey-Kramer HSD
test can be used with unbalanced sample
size designs, unlike the SNK test. e
Dunnett test is useful when planned
pairwise tests are restricted to one group
(e.g., a control group) being compared to
all other groups (e.g., treatment groups).
In summary, (i) the Tukey HSD and
Student-Newman-Keuls tests are rec-
ommended when performing all pair-
wise tests; (2) the Hochberg or Holm se-
quential alpha adjustments enhance the
power of multiple post hoc t-tests while
THE JOURNAL OF MANUAL & MANIPULATIVE THERAPY n VOLUME 17 n NUM BER 2 [E35]
An Al ys is o F VAr iAn Ce : Th e Fu nd Amen TAl Co n Cep Ts
maintaining control of familywise alpha;
and (3) the Dunnett test is preferred
when comparing one group to all other
groups.
Break from Tradition: Skip One-Factor
ANOVA and Proceed Directly to a MCP
us far, the conventional approach to
ANOVA and MCPs has been presented,
namely, run an ANOVA and if it is not
signicant, proceed no further; if the
ANOVA is signicant, then run MCPs to
determine which group means dier.
However, it has long been held by some
statisticians that in certain circum-
stances ANOVA can be skipped and that
an appropriate MCP is the only neces-
sary inferential test. To quote an inuen-
tial paper by Wilkinson et al18, the
ANOVA-followed-by-MCP approach
"is usually wrong for several reasons.
First, pairwise methods such as Tukey's
honestly signicant dierence proce-
dure were designed to control a family-
wise error rate based on the sample size
and number of comparisons. Preceding
them with an omnibus F test in a stage-
wise testing procedure defeats this de-
sign, making it unnecessarily conserva-
tive." Related to this perspective is the
fact that inferential discrepancies are
possible between ANOVA and MCPs, in
which one is statistically signicant and
the other is not. is can occur when p
values are near the boundary of alpha.
Each MCP has slightly dierent criteria
for statistical signicance (based on ei-
ther the t or q distribution), and all dier
slightly from the criteria of F scores
(based on the F distribution). An argu-
ment has also been put forth with respect
to performing pre-planned MCPs with-
out the need for a statistically signicant
ANOVA in clinical trials19. Nonetheless,
the convention remains to perform
ANOVA and then MCPs, but MCPs
alone are a statistically valid option.
ANOVA is especially warranted when
there are multiple factors, due to the abil-
ity of ANOVA to detect interactions.
Wilkinson et al.18 also reminds re-
searchers that it is rarely necessary to
perform all pairwise comparisons. Se-
lected pre-planned comparisons that are
driven by the research hypothesis, and
not a subcortical reex to perform every
conceivable pairwise comparison, will
reduce the number of extraneous pair-
wise comparisons and false positives,
and have the added benet of increasing
statistical power.
ANOVA Eect Size
Eect size is a unitless measure of the
magnitude of treatment eects20. For
ANOVA, there are two categories of ef-
fect size indices: (i) those based on pro-
portions of sum of squares (η2, partial η2,
ω2 ), and (ii) those based on a standard-
ized dierence between group means
(such as Cohen's d ) 21,22. e latter type of
eect size index is useful for power anal-
ysis, and will be discussed briey in the
next section. To an ever increasing de-
gree, peer review journals are requiring
the presentation of eect sizes with de-
scriptive summaries of data.
ere are three commonly used ef-
fect size indices that are based on pro-
portions of the familiar sum of squares
values that form the foundation of
ANOVA computations. e three indi-
ces are called eta squared (η2), partial eta
squared (partial η2), and omega squared
(ω2 ). ese indices range in value from 0
(no eect) to 1 (maximal eect) because
they are proportions of variance. ese
indices typically yield dierent values
for eect size.
Eta squared (η2) is calculated as:
η2 = SSBetween Groups / SSTotal
e SSBetween Groups term pertains to the in-
dependent variable of interest, whereas
SSTotal is based on the entire data set. Spe-
cically, for a factorial ANOVA, SSTotal =
[SSBetween Groups for all factors + SSError + all
SSInteractions ]. As such, the magnitude of η2
for a given factor will be inuenced by
the number of other independent vari-
ables. For example, η2 will tend to be
larger in a one-factor design than in a
two-factor design because in the latter
the SSTotal term will be inated to include
sum of squares arising from the second
factor.
Partial eta squared (partial η2) is
calculated with respect to the sum of
squares associated with the factor of in-
terest, not the total sum of squares:
partial η2 = SSBetween Groups /
(SSBetween Groups + SSError)
As with the η2 calculation, the SSBetween
Groups numerator term for partial η 2 per-
tains to the independent variable of in-
terest. However, the denominator diers
from that of η2. e denominator for
partial η2 is not based on the entire data
set (SSTotal) but instead on only SSBetween
Groups and SSError for the factor being eval-
uated. For a one-factor ANOVA, the
sum of square terms are identical for η2
and partial η2, so the values are identical;
however, with factorial ANOVA the de-
nominator for partial η2 will always be
smaller. For this reason, partial η2 is al-
ways larger than η2 with factorial
ANOVA (unless a factor or interaction
has absolutely no eect, as in the case of
the interaction in Figure 4, for which
both η2 and partial η2 equal 0).
Omega squared (ω2) is based on an
estimation of the proportion of variance
in the underlying population, in contrast
to the η2 and partial η2 indices that are
based on proportions of variance in the
sample. For this reason, ω2 will always be
a smaller value than η2 and partial η2.
Application of ω2 is limited to between-
subjects designs (i.e., not repeated mea-
sures) with equal samples sizes in all
groups. Omega squared is calculated as
follows:
ω2 = [SSBetween Groups —(dfBetween Groups ) *
(MSError )] / (SSTotal + MSError)
In contrast to η2, which provides an up-
wardly biased estimate of eect size
when the sample size is small, ω2 calcu-
lates an unbiased estimate23.
e reader is cautioned that η2and
partial η2 are oen misreported in the
literature (e.g., η2 incorrectly reported as
partial η2 )24,25 . It is advisable to calculate
these values by hand using the formulae
shown above as a conrmation of the
output of statistical soware programs,
to ensure accurate reporting. Refer to
Figure 4 for sample calculations of these
three eect size indices for a two-factor
ANOVA.
e η2 and partial η2 indices have
distinctly dierent attributes. Whether a
given attribute is considered to be an ad-
vantage or disadvantage is a matter of
perspective and context. Some authors24
argue the merits of eta squared, whereas
others4 prefer partial eta squared. Nota-
[E36] THE JOURNAL OF MANUAL & MANIPULATIVE THERAPY n VOLUME 17 n NUMBER 2
An Al ys is o F VAr iAn Ce : Th e Fu nd Amen TAl Co n Cep Ts
ble issues pertaining to these indices
include:
(i) Proportion of variance: When
there is a statistically signicant main ef-
fect or interaction, both η2 and partial η2
(and ω2) can be interpreted in terms of
the percentage of variance accounted for
by the corresponding independent vari-
able, even though they will oen yield
dierent values for factorial ANOVAs.
So if η2 = 0.20 and partial η2 = 0.25 for a
given factor, these two eect size indices
indicate that the factor accounts for 20%
vs. 25%, respectively, of the total vari-
ability in the dependent variable scores.
(ii) Relative values: Since η2 is either
equal to (one-factor ANOVA) or less
than (factorial ANOVA) partial η2, the
η2 index is the more conservative mea-
sure of eect size. is can be viewed as
a positive or negative attribute.
(iii) Additivity: η2 is additive, but
partial η2 is not. Since η2 for each factor
is calculated in terms of the total sum of
squares, all the η2 for an ANOVA are ad-
ditive and sum to 1 (i.e., they sum to
equal the amount of variance in the de-
pendent variable that arises from the ef-
fects of all the independent variables). In
contrast, a factor's partial η2 is calculated
in terms of that factor's sum of squares
(not the total sum of squares), so on
mathematical grounds the individual
partial η2 from an ANOVA are not addi-
tive and do not necessarily sum to 1.
(iv) Eects of multiple factors: As the
number of factors increases, the propor-
tion of variance accounted for by each
factor will necessarily decrease. Accord-
ingly, η2 decreases in an associated way.
In contrast, partial η2 for each factor is
calculated within the sum of squares
variance metrics of that particular fac-
tor, and is not inuenced by the number
of other factors.
How Many Subjects?
e aim of any experimental design is to
have adequate statistical power to detect
dierences between groups that truly
exist. ere is no simple answer to the
question of how many subjects are
needed for statistical validity using
ANOVA. Typical standards are to design
a study with an alpha of 0.05 to have with
statistical power of at least 0.80 (i.e., 80%
FIGURE 4. Calculations of three dierent measures of eect size for a two-factor (T and G) ANOVA of data set
shown in Figure 2. e eect sizes shown are all based on proportions of sum of squares: eta squared (η2), partial η2, and omega
squared (ω2). Note the following: (i) e denominator sum of squares term will be larger for η2 than for partial η2 in a factorial
ANOVA, so η2 will be smaller than partial η2. (ii) Omega squared (ω2) is a population estimate, whereas η2 and partial η2 are sample
estimates, so ω2 will be smaller than both η2 and partial η2. (iii) e sum of all η2 equals 1, whereas the sum of all partial η2 does
not equal 1 (can be less than or greater than). Refer to text for further explanation of these attributes.
Sum Degrees Mean
Effect of Squares of freedom Squares η 2 partial η 2 ω 2
T 56 2 28 0.76 0.90 0.72
G 12 1 12 0.16 0.67 0.15
T x Gender 0 2 0 0.00 0.00 0.00
Error 6 6 1 0.08 ---- ----
Total 74 11 1.00 1.57
Sample calculations:
η2 = SSBetween Groups / SSTotal
η2 for T = 56 / 74 = 0.76 = accounts for 76% of total variability in DV scores.
η2 for G = 12 / 74 = 0.16 = accounts for 16% of total variability in DV scores.
η2 for T*G interaction = 0 / 4 = 0.00 = accounts for 0% of total variability in DV scores.
η2 for Error = 6 / 74 = 0.08 = accounts for 8% of total variability in DV scores.
Sum of all η2 = 100%
partial η2 = SSBetween Groups / (SSBetween Groups + SSError)
partial η2 for T = 56 / (56 + 6) = 0.90 = accounts for 90% of total variability in DV scores.
partial η2 for G = 12 / (12 + 6) = 0.67 = accounts for 67% of total variability in DV scores.
partial η2 for T*G interaction = 0 / (0 + 6) = 0.00 = accounts for 0% of total variability in DV scores.
Sum of all partial η2 ≠100%
ω2 = [SSBetween Groups —(df Between Groups) * (MSError)] / (SSTotal + MSError)
ω2 for T = [56—(2)(1)] / [74 + 1] = 54 / 75 = 0.72
ω2 for G = [12—(1)(1)] / [74 + 1] = 11 / 75 = 0.15
ω2 for T*G interaction = [0—(2)(1)] / [74 + 1] = 0.00
THE JOURNAL OF MANUAL & MANIPULATIVE THERAPY n VOLUME 17 n NUM BER 2 [E37]
An Al ys is o F VAr iAn Ce : Th e Fu nd Amen TAl Co n Cep Ts
chance of detecting dierences between
group means that truly exists; alterna-
tively, a 20% chance of committing a
Type 2 error). Statistical power will be a
function of eect size, sample size, and
the number of independent variables
and levels, among other things. Ade-
quate sample size is a critical design con-
sideration, and prospective (a priori)
power analysis is performed to estimate
the required sample size that will yield
the desired level of power in the inferen-
tial analysis aer data are collected. is
entails a prediction of group mean dif-
ferences and group standard deviations
in the yet-to-be collected. Specically,
the eect size index used for prospective
power analysis is based on a standard-
ized measure such as Cohen's d, which is
based on predicted dierences in group
means (statistical signal) divided by
standard deviation (statistical noise).
Being based on dierences instead of
proportions, the d eect size index is
scaled dierently than the η2, partial η2
and ω2 described above, and can exceed
a value of 1.
e prediction of an experiment's
eect size that is part of a prospective
power analysis is nothing more than an
estimate. is estimate can be based on
pilot study data, previously published
ndings, intuition or best guesses. A
guiding principle should be to select an
eect size that is deemed to be clinically
relevant.
e approach used in a prospective
power analysis is outlined below for the
simple case of a t-test with independent
groups and equal variance, in which the
eect size index is dene as:
d = dierence in group means / standard
deviation of both groups
e estimate of the appropriate number
of subjects in each group for the speci-
ed alpha and power is given by the fol-
lowing equation26:
NEstimated = 2 x [ (zα + zβ ) / d ] 2
in which:
zα is the z value for the specied alpha.
With an alpha = 0.05, zα = 1.96 (2
tail).
zβ is the z value for the specied beta
(risk of Type 2 error). Power = 1- β.
For β = 0.20 (power = 0.80), z β =
0.84 (1 tail).
As a computational example, if the
eect size d is predicted to be 1.0 (which
equates to a dierence between group
means of one standard deviation), then
for alpha = 0.05 and power = 0.80 the
appropriate sample size for both groups
would be:
NEstimated = 2 x [(1.96 + 0.84) / 1]2
= 2 x [2.80 / 1]2 = 2 x 2.82 = 16
For a smaller eect size, a larger sample
size is needed, e.g., N = 63 for an eect
size of 0.5. e reader is cautioned that
these sample sizes are estimates based
on guesses about the predicted eect
size; they do not guarantee statistical
signicance.
Prospective power analysis for
ANOVA is more complex than outlined
above for a simple t-test. ANOVAs can
have numerous levels within a factor,
multiple factors, and interactions, all of
which need to be accounted for in a
comprehensive power analysis. ese
complications raise the following cau-
tionary note: ANOVA power analysis
quickly devolves into a series of progres-
sively more wild guesses (instead of "es-
timates") of eect sizes as the number of
independent variables and possible in-
teractions increase26. It is oen advisable
focus a prospective power analysis for
ANOVA on one factor that is of primary
interest, so as simplify the power analy-
sis and reduce the amount of unjusti-
able guesses. e reader is referred to
statistical textbooks (such as references
22, 26, 27) for dierent approaches that
can be used for prospective power anal-
ysis for ANOVA designs. As a general
guideline, it is desirable for group sam-
ple sizes to be large enough to invoke the
central limit theorem in the statistical
analysis (>30 or so) and for there to be a
balanced design (equal sample sizes in
each group).
Finally, a retrospective (post hoc)
power analysis is warranted aer data
are collected. e aim is to determine
the statistical power of the study, based
on the eect size (not estimated, but cal-
culated directly from the data) and sam-
ple size. is is particularly relevant for
statistically non-signicant ndings,
since the non-signicance may have
been the result of inadequate statistical
power. e textbooks cited above, as
well as many others, also discuss the me-
chanics of how to perform retrospective
power analyzes.
Conclusion: Statistical Signicance
Should not be Confused with
Clinical Signicance
ANOVA is a useful statistical tool for
drawing inferential conclusions about
how one or more independent variables
inuences a parametric dependent vari-
able (outcome measure). It is imperative
to keep in mind that statistical signi-
cance does not necessarily correspond
to clinical signicance. e much sought
aer statistically signicant ANOVA p
value has only two purposes: to play a
role in the inferential decision as to
whether group means dier from each
other (rejection of Null hypothesis), and
to assign a probability of the risk of com-
mitting a Type 1 error if the Null hy-
pothesis is rejected. Statistically signi-
cant ANOVA and MCPs say nothing
about the magnitude of group mean dif-
ferences, other than that a dierence ex-
ists. A large sample size can produce
statistical signicance with small dier-
ences in group means; depending on the
outcome measure, these small dier-
ences may have little clinical signi-
cance. Assigning clinical signicance is
a judgment call that needs to take into
account the magnitude of the dier-
ences between groups, which is best as-
sessed by examination of eect sizes.
Statistical signicance plays the role of a
searchlight to detect group dierences,
whereas eect size is useful for judging
the clinical signicance of these dier-
ences.
REFERENCES
1. Wackerly DD, Mendenhall W III, Scheaer
RL. Mathematical Statistics with Applica-
tions. 6th ed. Pacic Grove, CA: Druxbury
Press, 2002.
2. Shapiro SS, Wilk MB. An analysis of vari-
ance test for normality (complete samples).
Biometrika 1965;52:591–611.
3. D'Agnostino RB. An omnibus test for nor-
mality of moderate and large size samples.
Biometrika 1971;58:341–348.
[E38] THE JOURNAL OF MANUAL & MANIPULATIVE THERAPY n VOLUME 17 n NUMBER 2
An Al ys is o F VAr iAn Ce : Th e Fu nd Amen TAl Co n Cep Ts
4. Tabachnick BG, Fidell LS. Using Multivari-
ate Statistics. 5th ed. New York: Pearson
Education, 2007.
5. Levene H. Robust tests for the equality of
variance test for normality. In Olkin I, ed.
Contributions to Probability and Statistics:
Essays in Honor of Harold Hotelling. Palo
Alto: Stanford University Press, 1960.
6. Brown MB, Forsythe AB. Robust tests for the
equality of variances. Journal of the Ameri-
can Statistical Association 1974;69:364–367.
7. Zar JH. Biostatistical Analysis. Upper Saddle
River, NJ: Prentice Hall, 1998.
8. Daniel WW. Biostatistics: A Foundation for
Analysis in the Health Sciences. 7th ed.
Hoboken, NJ: John Wiley & Sons, Inc., 1999.
9. Box GEP. Non-normality and tests on vari-
ances. Biometrika 1953;40:318–335.
10. Box GEP. Some theorems on quadratic
forms applied in the study of analysis of vari-
ance problems: I. Eect of inequality of vari-
ance in the one way classication. Annals of
Mathematical Statistics 1954;25:290–302.
11. Wilkinson L, Blank G, Gruber C. Desktop
Data Analysis with SYSTAT. Upper Saddle
River, New Jersey: Prentice Hall, 1996.
12. Šidàk Z. Rectangular condence region for
the means of multivariate normal distribu-
tions. Journal of the American Statistical As-
sociation 1967;62:626–633.
13. Holm S. A simple sequentially rejective mul-
tiple test procedure. Scandinavian Journal of
Statistics 1979;6:65–70.
14. Hochberg Y. A sharper Bonferroni proce-
dure for multiple tests of signicance.
Biometrika 1988; 7:800–802.
15. Huang Y. Hochberg's step-up method: Cut-
ting corners o Holm's step-down method.
Biometrika 2007;94:965–975.
16. Toothaker L. Multiple Comparisons for Re-
searchers. New York, NY: Sage Publications,
1991.
17. Cabral HJ. Multiple Comparisons Proce-
dures. Circulation 2008;117:698–701.
18. Wilkinson L and the Task Force on Statisti-
cal Inference. Statistical methods in psy-
chology journals. American Psychologist
1999;54:594–604.
19. D'Agostino RB, Massaro J, Kwan H, Cabral
H. Strategies for dealing with multiple treat-
ment comparisons in conrmatory clinical
trials. Drug Information Journal 1993;27:
625–641.
20. Cook C. Clinimetrics Corner: Use of eect
sizes in describing data. J Man Manip er
2008;16:E54–E5.
21. Cohen J. Eta-squared and partial eta-
squared in xed factor ANOVA designs.
Educational and Psychological Measurement
1973;33:107–112.
22. Cohen J. Statistical Power Analysis for the
Behavioral Sciences. 2nd ed. Hillsdale, NJ:
Lawrence Erlbaum, 1988.
23. Keppel G. Design and Analysis: A Research-
er's Handbook. 2nd ed. Englewood Clis, NJ:
Prentice Hall, 1982.
24. Levine TR, Hullett CR. Eta squared, partial
eta squared, and misreporting of eect size
in communication research. Human Com-
munication Research 2002;28:612–625.
25. Pierce CA, Block RA, Aguinis H. Caution-
ary note on reporting eta-squared values
from multifactor ANOVA designs. Educa-
tional and Psychological Measurement, 2004;
64:916–924.
26. Norman GR, Streiner DL. Biostatistics e
Bare Essentials. Hamilton, Ontario: B.C.
Decker Inc., 1998.
27. Portney LG, Watkins MP. Foundations of
Clinical Research. Applications to Practice.
3rd ed. Upper Saddle River, NJ: Pearson
Education Inc., 2009.
... To further illustrate whether there are significant differences in these sociodemographics attributes of the samples from the four cities, the one-way analysis of variance (ANOVA) is adopted to perform statistical test. One-way ANOVA is a statistical test for detecting differences in group means when there is one parametric dependent variable and one or more independent variables (Sawyer 2009). When multiple groups are compared, one-way ANOVA is able to determine whether there are significant differences among groups (Kao and Green 2008;Verma 2013). ...
- Leiming Li
- Yu Zhang
Carsharing represents an alternative to private vehicles and is becoming internationally recognized as a method of sustainable transportation. Compared with the United States and countries in Europe, carsharing services in China started later and were initially underwhelming. With the revival and popularity of the sharing economy, carsharing has been thriving in China in recent years but remains in an initial stage. Understanding the determinants of people's intentions to carshare is critical for the promotion of carsharing services. The theoretical framework of this research is an expanded version of the theory of planned behavior containing environmental concern. A questionnaire was created to empirically test the model and a total of 1165 valid surveys were collected in four new first-tier cities in China. The intention to use carsharing was found to be directly affected by attitude, subjective norm, and perceived behavioral control rather than environmental concern. However, people's environmental concern was verified as indirectly impacting their intentions to use carsharing through attitude, subjective norm, and perceived behavioral control. In addition, this study also tested the moderating effect of car ownership, age, gender and income by adopting a multi-group analysis. The results confirm the moderating effect of car ownership and gender on people's intention to use carsharing, revealing the differences that exist between people with private cars and those without as well as the differences between the male and female gender. The moderating effects of age and income on people's intention to use carsharing were found to be insignificant. These findings provide practical insights for carsharing organizations and transportation departments. The limitations of this study and suggestions for further research are also discussed.
... To determine the mean value for each cause factor, a one-way between-group analysis of variance (ANOVA) was chosen. Analysis of variance (ANOVA) is a statistical test for detecting differences in group means when there is one parametric dependent variable and one or more independent variable [39]. Prior to conducting the ANOVA, the assumption of normality and homogeneity of variables were tested. ...
- Elikplim Afelete
- Wooyong Jung
Design change is a common but significant problem in construction projects. Issues of delay, cost overruns, claims, and disputes in projects occur as a result. However, design change studies in the power-project area are less often discussed. As a result, the primary objective of this study was to identify important cause factors of design changes according to different power-project types in Ghana. Following a thorough assessment of the literature, 36 potential causes were identified, which were narrowed down by expert reviews to 30. In this study, power projects were classified into three categories: power plant, renewable, and distribution and transmission. The results indicate owner-related financial problems as the most important cause of design change for all three project types, followed by the second and third most significant in each of the categories, respectively: errors and omission in design and problems or unforeseen site conditions in power plant projects; deficient quality and quantity of resources and inflation and changes in interest and exchange rates in renewable projects; and problems or unforeseen site conditions and changes of plans in distribution and transmission projects. Based on the findings, power-project stakeholders are able to comprehend the dynamics of design change and develop effective design management strategies to reduce impact.
... However, in some situations, departure from one of these assumptions does not markedly affect conclusions based on F-test. For example, looking for exact normality is a bit of a red herring because, we also have the "Central Limit Theorem (CLT)" that says that if the errors are not normal but still identically and independently distributed then the distribution of the coefficients will approach normality as the sample size increases ( [2], [24], [27]). This is what make statistics doable because no real data set entered into the computer is perfectly normal. ...
... ANOVA is a statistical tool used to detect differences between experimental group means. (Sawyer, 2009). ...
The aim of this study is to test the adaptability of seven (Triticum durum Desf.) genotypes growing under semi-arid condition.The experimentation was carried at the Agricultural Experimental Station of Setif (Algeria), in a design of random blocks with three replications; the plant material used in this study consists of seven genotypes (Triticum durum Desf.). At maturity, the following parameters were measured grain yield; economic yield; above ground biomass; number of spikes per meter square; number of grain per spike; harvest index; thousand kernel weight and plant height. Number of days to heading was accounted from sown date. ANOVA show that genotype and crop season effect were highly significant with all traits studied. MBB and Boussellem genotypes were the most suitable genotype; they have a good ranking under both crop seasons according to their results for all traits studied. 2016/2017 was the best crop season; it recorded the best values of grain yield and major of traits studied. The interaction genotype X crop season was highly significant for all traits studied.The correlations among traits and under both crop season show that grain yield was significantly correlated with economic yield and number of grains per spike. A high significant correlation was observed between economic yield and above ground biomass under both crop seasons .Number of days to heading was significantly correlated with plant height under both crop season.
... ANOVA Table Table 11 below shows the results for Analysis of Variance (ANOVA) table which is important to assess the statistical significance of the results, also used to detect any differences between experimental group differences (two or more) means and whether to accept or reject the hypothesis presented in this study (Sawyer, 2009). This analysis tests whether the R2 shown in table 10 above is significantly greater than zero by seeing the value of Sig whether p≤ 0.05 in table 11 above. ...
- Atheria Kurniawati
- Michele Shivaanii
- Jugindar Singh
The successful emergence of sharing economy has received so many attentions from both organizations and customers. This recent model of business is said by many to come in the nature of disruptive, especially within the tourism industry. From small to big and popular hotels suffer loss ever since its appearance. Moreover, the sudden boom of on-demand ride hailing services such as Grab becomes a threat to many traditional taxi companies that are believed to have operated and been successful for years before the footprints of Grab is vividly seen. It turned out that the culprit behind the success of this sharing economy business model is the utilization of technology to powerfully connect providers and customers seamlessly. Especially, when the scale of internet penetration and technology development is crawling in a lightning speed. Also, not only that, many claimed sharing economy to be more affordable, convenient and sustainable to the environment. These three just pack the right combo to succeed in the market in no time. As interesting as it is, this study will research the three prominent factors in sharing economy including economic, social and environmental factor to see if they influence customers satisfaction or if the increase of customers in sharing economy is just a form of customers' curiosity towards a brand innovation.
... Analysis of Variance (ANOVA) helps to examine any potential difference among the grouped variables (Sawyer, 2009). In this context, the current study also involved One-Way Analysis of Variance based on only one Independent variable. ...
- Salman Mohammad
- Salman Mohmmed Abu Lehyeh
- M-S@aau Edu Jo
- Alnawafleh
Knowledge management (KM) is necessary for all to achieve competitive positions and advantages.Undoubtedly, the people who learn fast, can utilize knowledge gain predominance and excellence. This predominance can be achieved by the knowledge development through both preexisting or acquired knowledge. By keeping in view this fact, this research also examines the potential relationship between knowledge management, its components and, organizational learning. The researcher employed experimental approach and assess the proposed relationship by using the Structural Equation Modelling. Results revealed a strong, significant relationship between knowledge management, and e-Business(p≥ .022),knowledge adoption (p≥ .000), Technical Factors (p≥ .000), and Organizational Learning (p≥ .000). Results of Path Analysis also showed strong correlation between the proposed variables as their value ranged from .873 to .975. Thus, the study concluded that, knowledge management is a core component of achieving the organizational excellence. As the more employee are having knowledge, the more they can cope with the challenges. The researcher recommends more studies to examine this relationship between knowledge management and organizational wellbeing among all the levels of enterprise particularly in the United Arab Emirates to dig out more significant outcomes on knowledge management.
- Sangeeta Jain
- Rajesh Kumar Kumawat
- Mratyunjay Rajkumar Gupta
Objectives Homoeopathic dilutions are used to increase active principles in medicinal plants, detoxify plants, increase plant growth rate and fruit production, improve plant metabolism and control diseases. This controlled experimental prospective study was conducted to evaluate the effect of homoeopathic medicines Zincum metallicum 6CH and Z. metallicum 12CH on plant growth of Abelmoschus esculentus L. in a natural environment. This study helps assess and establish the role of homoeopathy in propagating plant growth. Materials and Methods A. esculentus seeds were cultivated in a designated area of the Homoeopathy University campus. Among these, 30 received Zincum 6CH (20 drops in 1 litre water), while 30 received Zincum 12CH (20 drops in 1 L water) and 30 received normal water. After 60 days, the entire plant was measured for height, pod length and productivity. Results After 60 days, the number of fruits (plant productivity) in the groups receiving Zincum 6CH and 12CH was 335 and 267, respectively; in the group receiving normal water, the number of fruits was 159. The heights of plants receiving Zincum 6CH (M = 48.4 cm, SD = 2.65) and 12CH (M = 40.1 cm, SD = 2.39) were comparatively more than in plants receiving normal water (M = 31.6 cm SD = 2.26). The length of pods in plants receiving Zincum 6CH (M=13.3 cm, SD = 0.96) and 12CH (M = 10.3 cm, SD = 0.97) was comparatively more than in plants receiving normal water (M = 8.9 cm SD = 0.62). Conclusion The application of potentised homoeopathic medicines Zincum 6CH and 12CH on A. esculentus demonstrated a beneficial result, as observed through significant differences in plant productivity, mean plant height and mean pod length among the experimental and control groups. Zincum 6CH showed more efficacy than 12CH in all aspects of growth.
Analysis of variance is a procedure that examines the effect of one (or more) independent variable(s) on one (or more) dependent variable(s). For the independent variables, which are also called factors or treatments, only a nominal scaling is required, while the dependent variable (also called target variable) is scaled metrically. The analysis of variance is the most important multivariate method for the detection of mean differences across more than two groups and is thus particularly useful for the evaluation of experiments. The chapter deals with both the one-factorial (one dependent and one independent variable) and the two-factorial (one dependent and two independent variables) analysis of variance and extends the considerations in the case study to the analysis with two (nominally scaled) independent factors and two (metrically scaled) covariates. Furthermore, contrast analysis and post-hoc testing are also covered.
- Rustam Musta
- Laily Nurliana
- Herianto Harbi
- Siti Nurjana
Clove oil microencapsulated with maltodextrin as an antifungal agent against Candida albicans could produce different clear zones. A ratio of clove oil to maltodextrin of 1: 4 yielded clear zones in the medium group. Meanwhile, other ratios of 1: 6, 1: 8, 1:10, and 1:12, including the antifungal properties of clove oil, yielded clear zones in the strong category. The one-way ANOVA test results showed Fscore = 22.56 > Ftab(α = 0.01) = 5.06; indicating that at least one average clear zone diameter differed significantly from those of other treatment groups. HSD test shows that ratios of 1: 4 and 1: 6 obtained |Yi−Yj| > HSD, meaning that the coating has a negative effect. Meanwhile, in the ratios of 1:8; 1:10, and 1:12, the HSD test showed that |Yi−Yj| < HSD, meaning that the presence of maltodextrin did not affect. The increase in coating material was suspected to result in a more extensive distribution of clove oil in the coating material, thus increasing the interaction with Candida albicans.
Purpose Increased construction risk due to the complexity and numerous construction performance challenges requires improved construction project managers' competence. However, contextual interrogation of these competencies is limited. This paper aims to report on a study that sought to determine the expected competencies of construction project managers and assesses the existence of statistically significant differences due to gender, designation and educational levels of respondents. Design/methodology/approach A quantitative research design was instituted through the administration of a questionnaire survey on Project Managers, Architects, Engineers and Quantity surveyors working for consultants and contractors' organizations. Descriptive and inferential statistics analysed significant differences due to demographic variables. Factor analysis was also used to reveal interrelated significant sets of competencies expected of construction project managers. Findings Factor analysis determined 11 significant components with the highest-ranked components comprising organizational savvy and experience in managing project constraints. The univariate analysis determining effective communication, leadership and good team-building skills as being the three most critical expected competencies. Significant differences due to educational levels were established, with shortcomings existing in those with Diplomas. Research limitations/implications The higher education institutions need to establish curricula designs that align with the competency expectations. Mentorship programmes within construction organizations can also be significant in bridging the existing competence gap. However, due to the exploratory nature of the study, the insights of clients were not considered. Originality/value The study determined competencies for construction project managers and demographic-specific interventions.
Source: https://www.researchgate.net/publication/272311020_Analysis_of_Variance_The_Fundamental_Concepts
Posted by: malcolmmalcolmchapparoe0271777.blogspot.com
Post a Comment for "Analysis Of Variance Pdf Free Download"