Analysis of variance (ANOVA) is a statistical test for detecting differences in group means when there is one parametric dependent variable and one or more independent variables. This article summarizes the fundamentals of ANOVA for an intended benefit of the clinician reader of scientific literature who does not possess expertise in statistics. The emphasis is on conceptually-based perspectives regarding the use and interpretation of ANOVA, with minimal coverage of the mathematical foundations. Computational examples are provided. Assumptions underlying ANOVA include parametric data measures, normally distributed data, similar group variances, and independence of subjects. However, normality and variance assumptions can often be violated with impunity if sample sizes are sufficiently large and there are equal numbers of subjects in each group. A statistically significant ANOVA is typically followed up with a multiple comparison procedure to identify which group means differ from each other. The article concludes with a discussion of effect size and the important distinction between statistical significance and clinical significance.

Calculations of three different measures of effect size for a two-factor (Treatment and Gender) ANOVA of data set shown in Figure 2. The effect sizes shown are all based on proportions of sum of squares: eta squared (η 2 ), partial η 2 , and omega squared (ω 2 ). Note the following: (i) The denominator sum of squares term will be larger for η 2 than for partial η 2 in a factorial ANOVA, so η 2 will be smaller than partial η 2. (ii) Omega squared (ω 2 ) is a population estimate, whereas η 2 and partial η 2 are sample estimates, so ω 2 will be smaller than both η 2 and partial η 2. (iii) The sum of all η 2 equals 1, whereas the sum of all partial η 2 does not equal 1 (can be less than or greater than). Refer to text for further explanation of these attributes.

Figures - uploaded by Steven Sawyer

Author content

All figure content in this area was uploaded by Steven Sawyer

Content may be subject to copyright.

ResearchGate Logo

Discover the world's research

  • 20+ million members
  • 135+ million publications
  • 700k+ research projects

Join for free

THE JOURNAL OF MANUAL & MANIPULATIVE THERAPY n VOLUM E 17 n NUM BER 2 E 27

Department of Rehabilitation Sciences, School of Allied Health Sciences,

Texas Tech University Health Sciences Center, Lubbock, TX

Address all correspondence and requests for reprints to: Steven F. Sawyer, PT, PhD, steven.sawyer@ttuhsc.edu

Analysis of variance (ANOVA) is a

statistical tool used to detect dier-

ences between experimental group

means. ANOVA is warranted in experi-

mental designs with one dependent vari-

able that is a continuous parametric nu-

merical outcome measure, and multiple

experimental groups within one or more

independent (categorical) variables. In

ANOVA terminology, independent vari-

ables are called factors, and groups within

each factor are referred to as levels. e

array of terms that are part and parcel of

ANOVA can be intimidating to the un-

initiated, such as: partitioning of vari-

ance, main eects, interactions, factors,

sum of squares, mean squares, F scores,

familywise alpha, multiple comparison

procedures (or post hoc tests), eect size,

statistical power, etc. How do these terms

pertain to p values and statistical signi-

cance? What precisely is meant by a "sta-

tistically signicant ANOVA"? How does

analyzing variance result in an inferential

decision about dierences in group

means? Can ANOVA be performed on

non-parametric data? What are the vir-

tues and potential pitfalls of ANOVA?

ese are the issues to be addressed in

this primer on the use and interpretation

of ANOVA. e intent is to provide the

clinician reader, whose misspent youth

did not include an enthusiastic reading of

statistics textbooks, an understanding of

the fundamentals of this widely used

form of inferential statistical analysis.

ANOVA General Linear Models

ANOVA is based mathematically on lin-

ear regression and general linear models

that quantify the relationship between

the dependent variable and the indepen-

dent variable(s)1. ere are three dierent

general linear models for ANOVA: (i)

Fixed eects model (Model 1) makes infer-

ences that are specic and valid only to

the populations and treatments of the

study. For example, if three treatments

involve three dierent doses of a drug,

inferential conclusions can only be drawn

for those specic drug doses. e levels

within each factor are xed as dened by

the experimental design. (ii) Random ef-

fects model (Model 2) makes inferences

about levels of the factor that are not used

in the study, such as a continuum of drug

doses when the study only used three

doses. is model pertains to random ef-

fects within levels, and makes inferences

about a population's random variation.

(iii) Mixed eects model (Model 3) con-

tains both Fixed and Random eects.

In most types of orthopedic reha-

bilitation clinical research, the Fixed ef-

fects model is relevant since the statistical

inferences being sought are xed to the

levels of the experimental design. For this

reason, the Fixed eects model will be the

focus of this article. Computer statistics

programs typically default to the Fixed

eects model for ANOVA analysis, but

higher end programs can perform

ANOVA with all three models.

ABST RACT: Analysis of variance (ANOVA) is a statistical test for detecting dierences

in group means when there is one parametric dependent variable and one or more indepen-

dent variables. is article summarizes the fundamentals of ANOVA for an intended benet

of the clinician reader of scientic literature who does not possess expertise in statistics. e

emphasis is on conceptually-based perspectives regarding the use and interpretation of

ANOVA, with minimal coverage of the mathematical foundations. Computational exam-

ples are provided. Assumptions underlying ANOVA include parametric data measures,

normally distributed data, similar group variances, and independence of subjects. However,

normality and variance assumptions can oen be violated with impunity if sample sizes are

suciently large and there are equal numbers of subjects in each group. A statistically sig-

nicant ANOVA is typically followed up with a multiple comparison procedure to identify

which group means dier from each other. e article concludes with a discussion of eect

size and the important distinction between statistical signicance and clinical signicance.

KEYWORDS: Analysis of Variance, Interaction, Main Eects, Multiple Comparison

Procedures

Analysis of Variance: The Fundamental Concepts

PT, PhD

[E28] THE JOURNAL OF MANUAL & MANIPULATIVE THERAPY n VOLUME 17 n NUMBER 2

An Al ys is o F VAr iAn Ce : Th e Fu nd Amen TAl Co n Cep Ts

Assumptions of ANOVA

Assumptions for ANOVA pertain to the

underlying mathematics of general lin-

ear models. Specically, a data set should

meet the following criteria before being

subjected to ANOVA:

Parametric data: A parametric

ANOVA, the topic of this article, re-

quires parametric data (ratio or interval

measures). ere are non-parametric,

one-factor versions of ANOVA for non-

parametric ordinal (ranked) data, spe-

cically the Kruskal-Wallis test for inde-

pendent groups and the Friedman test

for repeated measures analysis.

Normally distributed data within

each group: ANOVA can be thought of

as a way to infer whether the normal dis-

tribution curves of dierent data sets are

best thought of as being from the same

population or dierent populations

(Figure 1). It follows that a fundamental

assumption of parametric ANOVA is

that each group of data (each level) be

normally distributed. e Shapiro-Wilk

test2 is commonly used to test for nor-

mality for group sample sizes (N) less

than 50; D'Agnostino's modication3 is

useful for larger samplings (N>50).

A normal distribution curve can be

described by whether it has symmetry

about the mean and the appropriate

width and height (peakedness). ese

attributes are dened statistically by

"skewness" and "kurtosis", respectively.

A normal distribution curve will have

skewness = 0 and kurtosis = 3. (Note that

an alternative denition of kurtosis sub-

tracts 3 from the nal value so that a

normal distribution will have kurtosis =

0. is "minus 3" kurtosis value is some-

times referred to as "excess kurtosis" to

distinguish it from the value obtained

with the standard kurtosis function. e

kurtosis value calculated by many statis-

tical programs is the "minus 3" variant

but is referred to, somewhat mislead-

ingly, as "kurtosis."). Normality of a data

set can be assessed with a z-test in refer-

ence to the standard error of skewness

(estimated as √[6 / N) and the standard

error of kurtosis (estimated as √[24 /

N)4 . A conservative alpha of 0.01 ( z

FIGURE 1. Graphical representation of statistical Null and Alternative hypotheses for ANOVA in the case of one dependent

variable (change in ankle ROM pre/post manual therapy treatment, in units of degrees), and one independent variable with

three levels (three dierent types of manual therapy treatments). For this ctitious data, the group (sample) means are 13, 14 and

18 degrees of increased ankle ROM for treatment type groups 1, 2 and 3, respectively (raw data are presented in Figure 2). e

Null hypothesis is represented in the le graph, in which the population means for all three groups are assumed be identical

to each other (in spite of dierence in sample means calculated from the experimental data). Since in the Null hypothesis the

subjects in the three groups are considered to compose a single population, by denition the population means of each group are

equal to each other, and are equal to the Grand mean (mean for all data scores in the three groups). e corresponding normal

distribution curves are identical and precisely overlap along the X-axis. e Alternative hypothesis is shown in right graph, in

which dierences in group sample means are inferred to represent true dierences in group population means. ese normal

distribution curves do not overlap along the X-axis because each group of subjects are considered to be distinct populations with

respect to ankle ROM, created from the original single population that experienced dierent ecacies of the three treatments.

Graph is patterned aer Wilkinson et al11.

probability density Function

probability density Function

Null hypothesis:

Identical normal distribution curves

Alternative hypothesis:

Different normal distribution curves

increased ankle r om (degree) increased ankle r om (degree)

THE JOURNAL OF MANUAL & MANIPULATIVE THERAPY n VOLUME 17 n NUM BER 2 [E29]

An Al ys is o F VAr iAn Ce : Th e Fu nd Amen TAl Co n Cep Ts

2.56) is appropriate, due to the overly

sensitive nature of these tests, especially

for large sample sizes (>100)4. As a com-

putational example, for N = 20, the esti-

mation of standard error of skewness =

√[6 / 20] = 0.55, and any skewness value

greater than ±2.56 x 0.55 = ±1.41 would

indicate non-normality. Perhaps the

best "test" is what always should be

done: examine a histogram of the distri-

bution of the data. In practice, any dis-

tribution that resembles a bell-shaped

curve will be "normal enough" to pass

normality tests, especially if the sample

size is adequate.

Homogeneity of variance within

each group: Referring again to the notion

that ANOVA compares normal distri-

bution curves of data sets, these curves

need to be similar to each other in shape

and width for the comparison to be

valid. In other words, the amount of data

dispersion (variance) needs to be similar

between groups. Two commonly in-

voked tests of homogeneity of variance

are by Levene5 and Brown & Forsthye6.

Independent Observations: A gen-

eral assumption of parametric analysis is

that the value of each observation for

each subject is independent of (i.e., not

related to or inuenced by) the value of

any other observation. For independent

groups designs, this issue is addressed

with random sampling, random assign-

ment to groups, and experimental con-

trol of extraneous variables. is as-

sumption is an inherent concern for

repeated measures designs, in which an

assumption of sphericity comes into

play. When subjects are exposed to all

levels of an independent variable (e.g.,

all treatments), it is conceivable that the

eects of a treatment can persist and af-

fect the response to subsequent treat-

ments. For example, if a treatment eect

for one level has a long half-time (analo-

gous to a drug eect) and there is inad-

equate "wash out" time between expo-

sure to dierent levels (treatments),

there will be a carryover eect. A well

designed and executed cross-over ex-

perimental design can mitigate carry-

over eects. Mauchly's test of sphericity

is commonly employed to test the as-

sumption of independence in repeated

measures designs. If the Mauchly test is

statistically signicant, corrections to

the F score calculation are warranted.

e two most commonly used correc-

tion methods are the Greenhouse-

Geisser and Huynh-Feldt, which calcu-

late a descriptive statistic called epsilon,

which is a measure of the extent to which

sphericity has been violated. e range

of values for epsilon are 1 (no sphericity

violation) to a lower boundary of 1 /

(m —1), where m = number of levels. For

example, with three groups, the range

would be 1 to 0.50. e closer epsilon is

to the lower boundary, the greater the

degree of violation. ere are three op-

tions for adjusting the ANOVA to ac-

count for the sphericity violation, all of

which involve modifying degrees of

freedom: use the lower boundary epsi-

lon, which is the most conservative ap-

proach (least powerful) and will gener-

ate the largest p value, or use either the

Greenhouse-Geisser epsilon or the

Huynh-Feldt epsilon (most powerful)

[statistical power is the ability of an in-

ferential test to detect a dierence that

actually exists, i.e., a true positive].

Most commercially available statis-

tics programs perform normality, ho-

mogeneity of variance and sphericity

tests. Determination of the parametric

nature of the data and soundness of the

experimental design is the responsibility

of the investigator, reviewers and critical

readers of the literature.

Robustness of ANOVA to Violations of

Normality and Variance Assumptions

ANOVA tests can handle moderate vio-

lations of normality and equal variance

if there is a large enough sample size and

a balanced design7. As per the central

limit theorem, the distribution of sam-

ple means approximates normality even

with population distributions that are

grossly skewed and non-normal, so long

as the sample size of each group is large

enough. ere is no xed denition of

"large enough", but a rule of thumb is

N≥308 . us, the mathematical validity

of ANOVA is said to be "robust" in the

face of violations of normality assump-

tions if there is an adequate sample size.

ANOVA is more sensitive to violations

of the homogeneity of variance assump-

tion, but this is mitigated if sample sizes

of factors and levels are equal or nearly

so9,10 . If normality and homogeneity of

variance violations are problematic,

there are three options: (i) Mathemati-

cally transform (log, arcsin, etc.) the

data to best mitigate the violation, with

the cost of cognitive fog in understand-

ing the meaning to the ANOVA results

(e.g., "A statistically signicant main ef-

fect was obtained for the arcsin transfor-

mation of degrees of ankle range of mo-

tion"). (ii) Use one of the non-parametric

ANOVAs mentioned above, but at the

cost of reduced power and being limited

to one-factor analysis. (iii) Identify out-

liers in the data set using formal statisti-

cal criteria (not discussed here). Use

caution in deleting outliers from the

data set; such decisions need to be justi-

ed and explained in research reports.

Removal of outliers will reduce devia-

tions from normality and homogeneity

of variance.

If You Understand t-Tests, You Already

Know A Lot About ANOVA

As a starting point, the reader should

understand that the familiar t-test is an

ANOVA in abbreviated form. A t-test is

used to infer on statistical grounds

whether there are dierences between

group means for an experimental design

with (i) one parametric dependent vari-

able and (ii) one independent variable

with two levels, i.e., there is one outcome

measure and two groups. In clinical re-

search, levels oen correspond to dier-

ent treatment groups; the term "level"

does not imply any ordering of the

groups.

e Null statistical hypothesis for a

t-test is H0: 1 = 2, that is, the population

means of the two groups are the same.

Note that we are dealing with popula-

tion means, which are almost always

unknown and unknowable in clinical

research. If the Null hypothesis involved

sample means, there would be nothing

to infer, since descriptive analysis pro-

vides this information. However, with

inferential analysis using t-tests and

ANOVA, the aim is to infer, without ac-

cess to "the truth", if the group popula-

tion means dier from each other.

e Alternative hypothesis, which

comes into play if the Null hypothesis is

rejected, asserts that the group popula-

[E30] THE JOURNAL OF MANUAL & MANIPULATIVE THERAPY n VOLUME 17 n NUMBER 2

An Al ys is o F VAr iAn Ce : Th e Fu nd Amen TAl Co n Cep Ts

tion means dier. e Null hypothesis is

rejected when the p value yielded by the

t-test is less than alpha. Alpha is the pre-

determined upper limit risk for commit-

ting a Type 1 error, which is the statisti-

cal false positive of incorrectly rejecting

the Null hypothesis and inferring the

groups means dier when in fact the

groups are from a single population. By

convention, alpha is typically set to 0.05.

e p value generated by the t-test statis-

tic is based on numerical analysis of the

experimental data, and represents the

probability of committing a Type 1 error

if the Null hypothesis is rejected. When

p is less than alpha, there is a statistically

signicant result, i.e., the values in the

two groups are inferred to dier from

each other and to represent separate

populations. e logic of statistical in-

ference is analogous to a jury trial: at the

outset of the trial (inferential analysis),

the group data are presumed to be in-

nocent of having dierent population

means (Null hypothesis) unless the dif-

ferences in group means in the sampled

data are suciently compelling to meet

the standard of "beyond a reasonable

doubt" (p less than alpha), in which case

a guilty verdict is rendered (reject Null

hypothesis and accept Alternative hy-

pothesis = statistical signicance).

e test statistic for a t-test is the t

score. In conceptual terms, the calcula-

tion of a t score for independent groups

(i.e., not repeated measures) is as fol-

lows:

t = statistical signal / statistical noise

t = treatment eect / unexplained vari-

ance ("error variance")

t = dierences between sample means

of the two groups / within-group

variance

e dierence in group means repre-

sents the statistical signal since it is pre-

sumed to result from treatment eects of

the dierent levels of the independent

variable. e within-group variance is

considered to be statistical noise and an

"error" term because it is not explained

by the inuence of the independent vari-

able on the dependent variable. e par-

ticulars of how the t score is calculated

depends on the experimental design (in-

dependent groups vs repeated measures)

and whether variance between groups is

equivalent; the reader is to referred to

any number of statistics books for details

about the formulae. e t score is con-

verted into a p value based on the magni-

tude of the t score (larger t scores lead to

smaller p values) and the sample size

(which relates to degrees of freedom).

ANOVA Null Hypothesis

and Alternative Hypothesis

ANOVA is applicable when the aim is to

infer dierences in group values when

there is one dependent variable and

more than two groups, such as one inde-

pendent variable with three or more lev-

els, or when there are two or more inde-

pendent variables. Since an independent

variable is called a "factor", ANOVAs are

described in terms of the number of fac-

tors; if there are two independent vari-

ables, it is a two-factor ANOVA. In the

simpler case of a one-factor ANOVA,

the Null hypothesis asserts that the pop-

ulation means for each level (group) of

the independent variable are equal. Let's

use as an example a ctitious experi-

ment with one dependent variable (pre/

post changes in ankle range of motion in

subjects who received one of three types

of manual therapy treatment aer surgi-

cal repair of a talus fracture). is con-

stitutes a one-factor ANOVA with three

levels (the three dierent types of treat-

ment). e Null hypothesis is: H0: 1 = 2

= 3. e Alternative hypothesis is that at

least two of group means dier. Figure 1

provides a graphical presentation of this

ANOVA statistical hypotheses: (i) the

Null hypothesis (le graph) asserts that

the normal distribution curves of data

for the three groups are identical in

shape and position and therefore pre-

cisely overlap, whereas (ii) the Alterna-

tive hypothesis (right graph) asserts that

these normal distribution curves are

best described by the distribution indi-

cated by the sample means, which repre-

sent an experimentally-derived estimate

of the population means11.

The Mechanics of Calculating a

One-factor ANOVA

ANOVA evaluates dierences in group

means in a round-about fashion, and

involves the "partitioning of variance"

from calculations of "Sum of Squares"

and "Mean Squares." ree metrics are

used in calculating the ANOVA test sta-

tistic, which is called the F score (named

aer R.A. Fisher, the developer of

ANOVA): (i) Grand Mean, which is the

mean of all scores in all groups; (ii) Sum

of Squares, which are of two kinds, the

sum of all squared dierences between

group means and the Grand Mean (be-

tween-groups Sum of Squares) and the

sum of squared dierences between in-

dividual data scores and their respective

group mean (within-groups Sum of

Squares), and (iii) Mean Squares, also of

two kinds (between-groups Mean

Squares, within-groups Mean Squares),

which are the average deviations of indi-

vidual scores from their respective

mean, calculated by dividing Sum of

Squares by their appropriate degrees of

freedom.

A key point to appreciate about

ANOVA is that the data set variance is

partitioned into statistical signal and

statistical noise components to generate

the F score. e F score for independent

groups is calculated as:

F = statistical signal / statistical noise

F = treatment eect / unexplained vari-

ance ("error variance")

F = Mean SquaresBetween Groups / Mean

SquaresWithin Groups (Error)

Note that the statistical signal, the MSBe-

tween Groups term, is an indirect measure of

dierences in group means. e MSWithin

Groups (Error) term is considered to represent

statistical noise/error since this variance

is not explained by the eect of the inde-

pendent variable on the dependent vari-

able. Here is the gist of the issue: as

group means increasingly diverge from

each other, there is increasingly more

variance for between-group scores in re-

lation to the Grand Mean, quantied as

Sum of SquaresBetween Groups, leading to a

larger MSBetween Groups term and a larger F

score. Conversely, as there is more vari-

ance within-group scores, quantied as

Sum of SquaresWithin Groups (Error), the

MSWithin Groups (Error) term will increase,

leading to a smaller F score. us, for

independent groups, large F scores arise

from large dierences between group

THE JOURNAL OF MANUAL & MANIPULATIVE THERAPY n VOLUME 17 n NUM BER 2 [E31]

An Al ys is o F VAr iAn Ce : Th e Fu nd Amen TAl Co n Cep Ts

means and/or small variances within

groups. Larger F scores equate to lower

p values, with the p value also inuenced

by the sample size and number of groups,

each of which constitutes separate types

of "degrees of freedom."

ANOVA calculations are now the

domain of computer soware, but there

is illustrative and heuristic value in man-

ually performing the arithmetic calcula-

tion of the F score to garner insight into

how analysis of data set variance gener-

ates a statistical inference about dier-

ences in group means. A numerical ex-

ample is provided in Figure 2, in which

the data set graphed in Figure 1 is listed

and subjected to ANOVA, yielding a cal-

culated F score and corresponding p

value.

Mathematical Equivalence of t-tests

and ANOVA: t-tests are a Special Case

of ANOVA

Let's briey return to the notion that a

t-test is a simplied version of ANOVA

that is specic to the case of one inde-

pendent variable with two groups. If we

analyze the data in Figure 2 for the Type

1 treatment vs. Type 3 treatment group

data (disregarding the Type 2 treatment

group data to reduce the analysis to two

groups), the t score for independent

groups is 5.0 with a p value of 0.0025

(calculations not shown). For the same

data assessed with ANOVA, the F score

is 25.0 with a p value of 0.0025. e t-test

and ANOVA generate identical p values.

e mathematical relation between the

two test statistics is: t 2 = F.

Repeated Measures ANOVA: Dierent

Error Term, Greater Statistical Power

e experimental designs emphasized

thus far entail independent groups, in

which each subject is "exposed" to only

one level of an independent variable. In

FIG URE 2. e mechanics of calculating a F score for a one-factor ANOVA with independent groups by partitioning the data

set variance as Sum of Squares and Mean Squares are shown below. is ctitious data set lists increased ankle range of motion

pre/post for three dierent types of manual therapy treatments. For the sake of clarity and ease of calculation, a data set with an

inappropriately small sample size is used.

Subject Manual Therapy Manual Therapy Manual Therapy

Gender treatment Type 1 treatment Type 2 treatment Type 3

Male 14 16 20

Male 14 14 18

Female 11 13 17

Female 13 13 17

Group Means 13 14 18

Grand Mean 15

In the following, SS = Sum of Squares; MS = Mean Squares; df = degrees of freedom.

SSTotal = SS Between Groups + SSWithin Groups (Error), and is calculated by summing the squares of dierences between each data value vs.

the Grand Mean. For this data set with a Grand Mean of 15:

SSTotal = (14-15 )2 + (14-15 )2 + (11-15 )2 + (13-15 )2 + (16-15 )2 + (14-15 )2 + (13-15 )2 + (13-15 )2 + (20-15 )2 + (18-15 )2 + (17-15 )2

+ (17-15 )2 = 74

SSWithin Groups (Error) = SSMT treatment Type 1 (Error) + SSMT treatment Type 2 (Error) + SSMT treatment Type 3 (Error), in which the sum of squares within each

group is calculated in reference to the group's mean:

SSMT treatment Type 1 (Error) = (14-13)2 + (14-13 )2 + (11-13 )2 + (13-13 )2 = 6

SSMT treatment Type 2 (Error) = (16-14)2 + (14-14 )2 + (13-14 )2 + (13-14 )2 = 6

SSMT treatment Type 3 (Error) = (20-18)2 + (18-18 )2 + (17-18 )2 + (17-18 )2 = 6

SSWithin Groups (Error) = 6 + 6 + 6 = 18. By subtraction, SSBetween Groups = 74 - 18 = 56

df refers to the number of independent measurements used in calculating a Sum of Squares.

dfBetween Groups = (# of groups—1) = (3—1) = 2

dfWithin Groups (Error) = (N —# of groups) = (12—1) = 9

ANOVA test statistic, the F score, is calculated from Mean Squares (SS/df ):

F = Mean SquaresBetween Groups / Mean SquaresWithin Groups (Error)

Mean SquaresBetween Groups = SSBetween Groups / df Between Groups = 56 / 2 = 28

Mean SquaresWithin Groups (Error) = SSWithin Groups (Error) / dfWithin Groups (Error) = 18 / 9 = 2

So, F = 28 / 2 = 14

With df Between Groups = 2 and df Within Groups (Error) = 9, this F score translates into p = 0.0017 , a statistically signicant result for

alpha = 0.05.

[E32] THE JOURNAL OF MANUAL & MANIPULATIVE THERAPY n VOLUME 17 n NUMBER 2

An Al ys is o F VAr iAn Ce : Th e Fu nd Amen TAl Co n Cep Ts

the data set of Figure 2, this would in-

volve each subject receiving only one of

the three dierent treatments. If a sub-

ject is exposed to all levels of an inde-

pendent variable, the mechanics of the

ANOVA are altered to take into account

that each subject serves as their own ex-

perimental control. Whereas the term

for statistical signal, MSBetween Groups, is un-

changed, there is a new statistical noise

term called MSWithin Subjects (Error) that per-

tains to variance within each subject

across all levels of the independent vari-

able instead of between all subjects

within one level. Since there is typically

less variation within subjects than be-

tween subjects, the statistical error term

is typically smaller in repeated measures

designs. A smaller MSWithin Subjects (Error)

value leads to a larger F value and a

smaller p value. As a result, repeated

measures ANOVA typically have greater

statistical power than independent

groups ANOVA.

Factorial ANOVA: Main Eects and

Interactions

An advantage of ANOVA is its ability to

analyze an experimental design with

multiple independent variables. When

an ANOVA has two or more indepen-

dent variables it is referred to as a facto-

r ial ANOVA, in contrast to the one-

factor ANOVAs discussed thus far. is

is ecient experimentally, because the

eects of multiple independent variables

on a dependent variable are tested on

one cohort of subjects. Furthermore,

factorial ANOVA permits, and requires,

an evaluation of whether there is an in-

terplay between dierent levels of the

independent variables, which is called

an interaction.

Denitions of terminology that is

unique to factorial ANOVA are war-

ranted: (i) Main eect is the eect of an

independent variable (a factor) on a de-

pendent variable, determined separate

from of the eects of other independent

variables. A main eect is a one-factor

ANOVA that is performed on a factor

that disregards the eects of other fac-

tors. In a two factor ANOVA, there are

two main eects, one for each indepen-

dent variable; a three-factor ANOVA

has three main eects, and so on. (ii) In-

teraction describes an interplay between

independent variables such that dier-

ent levels of the independent variables

have non-additive eects on the depen-

dent variable. In formal terms, there is

an interaction between two factors

when the dependent variable response

at levels of one factor dier from those

produced at levels of the other factor(s).

Interactions can be easily identied in

graphs of group means. For example,

again referring to the data set from Fig-

ure 2, let us now consider the eect of

subject gender as a second independent

variable. is would be a two factor

ANOVA: one factor is the sex of sub-

jects, called G, with two levels;

the second factor is the type of manual

therapy treatment, called T,

with three levels. A shorthand descrip-

tion of this design is 2x3 ANOVA (two

factors with two and three levels, re-

spectively). For this two-factor ANOVA,

there are three Null hypotheses: (i)

Main Eect for the G factor: Are

there dierences in the response (ankle

range of motion) for males vs. females

to manual therapy treatment (combin-

ing data for the three levels of the

T factor with respect to the

two G factor levels)? (ii) Main Ef-

fect for the T factor: Are

there dierences in the response for

subjects in the three levels of the T-

 factor (combining data for males

and females in the G factor with

respect to the three T factor

levels)? (iii) Interaction: Are there dif-

ferences due to neither the G or

T factors alone but to the

combination of these factors? With re-

spect to analysis of interactions, Figure

3 shows a table of group means for all

levels the two independent variables,

based on data from Figure 2. Note that

the two independent variables are

graphed in relation to the dependent

variable. e two lines in the le graph

are parallel, indicating the absence of an

interaction between the levels of the two

factors. An interaction would exist if the

graphs were not parallel, such as in the

right graph in which group means for

males and females on the Type 2 treat-

ment were switched for illustrative pur-

poses. If the lines deviate from parallel

to a sucient degree, the interaction

will be statistically signicant. In this

case with two factors, there is only one

interaction to be evaluated. With three

or more independent variables, there

are multiple interactions that need to be

considered. A statistically signicant in-

teraction complicates the interpretation

of the Main Eects, since the factors are

not independent of each other in their

eects on the dependent variable. Inter-

actions should be examined before

Main Eects. If interactions are not sta-

tistically sig nicant, then Main Eects

can be easily evaluated as a series of

one-factor ANOVAs.

So There is a Statistically Signicant

ANOVA—Now What? Multiple

Comparison Procedures

If an ANOVA does not yield statistical

signicance on any main eects or inter-

actions, the Null hypothesis (hypothe-

ses) is (are) accepted, meaning that the

dierent levels of independent variables

did not have any dierential eects on

the dependent variable. e inferential

statistical work is done (but see next sec-

tion), unless confounding covariates are

suspected, possibly warranting analysis

of covariance (ANCOVA), which is be-

yond the scope of this article.

When statistical signicance is ob-

tained in an ANOVA, additional statisti-

cal tests are necessary to determine

which of the group means dier from

each other. ese follow-up tests are re-

ferred to as multiple comparison proce-

dures (MCPs) or post hoc tests. MCPs

involve multiple pair-wise comparisons

(or contrasts) in a fashion designed to

maintain alpha for the family of com-

parisons to a specied level, typically

0.05. is is referred to as the familywise

alpha. ere are two general options for

MCP tests: either perform multiple t-

tests that require "manual" adjustment

of the alpha for each pairwise test to

maintain a familywise alpha of 0.05, or

use a test such as the Tukey HSD (see

below) that has built-in protection from

alpha ination. Multiple t-tests have

their place, especially when only a sub-

set of all possible pairwise comparisons

are to be performed, but the special pur-

pose MCPs are preferable when all pair-

wise comparisons are assessed.

THE JOURNAL OF MANUAL & MANIPULATIVE THERAPY n VOLUME 17 n NUM BER 2 [E33]

An Al ys is o F VAr iAn Ce : Th e Fu nd Amen TAl Co n Cep Ts

Using the simple case of a statisti-

cally signicant one-factor ANOVA,

t-tests can be used for post hoc evalua-

tion with the aim of identifying which

levels dier from each. However, with

multiple t-tests there is a need to adjust

alpha for each t-test in such a way as to

maintain the familywise alpha at 0.05. If

all possible pairwise comparisons are

performed, there will be a geometric in-

crease in the number of t-tests as the

number of levels increases, as dened by

C = m (m - 1) / 2, where C = number of

pairwise comparisons, and m = number

of levels in a factor. For example, there

are three pairwise comparisons for three

levels; six comparisons for four levels;

ten comparisons for ve levels, and so

forth. ere is a need to maintain fami-

lywise alpha to 0.05 in these multiple

comparisons to maintain the risk of

Type 1 errors to no more than 5%. is

is commonly accomplished with the

Bonferroni (or Dunn) adjustment, in

which alpha for each post hoc t-test is

adjusted by dividing the familywise al-

pha (0.05) by the number of pairwise

comparisons:

αMultiple t-tests = αFamilywise / C

FIG URE 3. Factorial ANOVA interactions, which are assessed with a table and a graph of group means. Group means are based

on data presented in Figure 2, and represents a 3x2 two-factor (T x G) ANOVA with independent groups. In

reference to j columns and k rows indicated in the table below, the Null hypothesis for this interaction is:

j1,k1 – j1k2 = j2,k1– j2k2 = j3,k1– j3k2

e graph below le shows the group means of the two independent variables in relation to the dependent variable. e parallel

lines indicate that males and females displayed similar changes in ankle ROM for the three types of treatment, so there was no

interaction between the dierent levels of the independent variables. Consider the situation in which the group means for males

and females on treatment type 2 are reversed. ese altered group means are shown in the graph below right. e graphed lines

are not parallel, indicating the presence of an interaction. In other words, the relative ecacies of the three treatments are dierent

for males and females; whether this meets the statistical level of an interaction is determined by ANOVA (p less than alpha).

FACTOR A:

T

Treatment Treatment Treatment

FACTOR B: Type 1 Type 2 Type 3 Factor B Main

G (Level j = 1) (Level j = 2) (Level j = 3) Effect (row means)

Male

(Level k = 1) 14 15 19 16

Female

(Level k = 2) 12 13 17 14

Factor A Main

Effect (column

means) 13 14 18

Type 1 Type 2 Type 3 Type 1 Type 2 Type 3

Manual erapy Treatment Manual erapy Treatment

Increase in ankle ROM

Increase in ankle ROM

[E34] THE JOURNAL OF MANUAL & MANIPULATIVE THERAPY n VOLUME 17 n NUMBER 2

An Al ys is o F VAr iAn Ce : Th e Fu nd Amen TAl Co n Cep Ts

If there are two pairwise compari-

sons, alpha for each t-test is set to 0.05/2

= 0.25; for three comparisons, alpha is

0.05/3 = 0.0167, and so on. Any pairwise

t-test with a p value less than the ad-

justed alpha would be considered statis-

tically signicant. e trade-o for pre-

venting familywise alpha ination is that

as the number of comparisons increases,

it becomes incrementally more dicult

to attain statistical signicance due to

the lower alpha. Furthermore, the ina-

tion of familywise alpha with multiple

t-tests is not additive. As a result, Bon-

ferroni adjustments overcompensate the

alpha adjustment, making this the most

conservative (least powerful) of all

MCPs. For example, running two t-tests,

each with alpha set to 0.05, does not

double familywise alpha to 0.10; it in-

creases it to only 0.0975. e eects of

multiple t-tests on familywise alpha and

Type 1 error rate is dened by the fol-

lowing formula:

αFamilywise = 1—(1—αMultiple t-tests )C

e overcorrection by the Bonferroni

technique becomes more prominent

with many pairwise comparisons: exe-

cuting 20 t-tests, each with an alpha of

0.05, does not yield a familywise alpha of

20 x 0.05 = 1.00 (i.e., 100% chance of

Type 1 error); the value is actually 0.64.

ere are modications of the Bonfer-

roni adjustment developed by Šidák12 to

more accurately reect the ination of

familywise alpha that result in larger ad-

justed alpha levels and therefore in-

creased statistical power, but the eects

are slight and rarely convert a margin-

ally non-signicant pairwise compari-

son into a statistical signicance. For

example, with three pairwise compari-

sons, the Bonferroni adjusted alpha of

0.167 is increased by only 0.003 to 0.170

with the Šidák adjustment.

e sequential alpha adjustment

methods for multiple post hoc t-tests by

Holm13 and Hochberg14 provide in-

creased power while still maintaining

control of the familywise alpha. ese

techniques permit the assignment of sta-

tistical signicance in certain situations

for which p values are less than 0.05 but

do not meet the Bonferroni criterion for

signicance. e sequential approach by

Holm13 and Hochberg14 are called step-

down and step-up procedures, respec-

tively. In Hochberg's step-up procedure

with C pairwise comparisons, the t-test p

values are evaluated sequentially in de-

scending order, with p 1 the lowest value

and p C the highest. If pC is less than 0.05,

all the p values are statistically signi-

cant. If pC is greater than 0.05, that evalu-

ation is non-signicant, and the next

largest p value, p C - 1, is evaluated with a

Bonferroni adjusted alpha of 0.05/2 =

0.025. If p C - 1 is signicant, then all re-

maining p values are signicant. Each

sequential evaluation leads to an alpha

adjustment based on the number of pre-

vious evaluations, not on the entire set of

possible evaluations, thereby yielding

increased statistical power compared to

the Bonferroni method. For example, if

three p values are 0.07, 0.02 and 0.015,

Hochberg's method evaluates p 3 = 0.07

vs alpha = 0.05/1 = 0.05 (non-signi-

cant); then p 2 = 0.02 vs. alpha = 0.05/2 =

0.025 (signicant); and then p 1 = 0.015

vs alpha = 0.05/3 = 0.0167 (signicant).

Holm's method performs the inverse se-

quence and alpha adjustments, such that

the lowest p value is evaluated rst with a

fully adjusted alpha. In this case: p 1 =

0.015 vs alpha = 0.05/3 = 0.0167 (signi-

cant); then p2 = 0.020 vs. alpha = 0.05/2 =

0.025 (signicant); and then p 3 = 0.070

vs alpha = 0.05/1 = 0.05 (non-signi-

cant). Once Holm's method encounters

non-signicance, sequential evaluations

end, whereas Hochberg' method contin-

ues testing. For these three p values, the

Bonferroni adjustment would nd p =

0.015 signicant but p = 0.02 to be non-

signicant. As can be seen, the methods

of Hochberg and Holm are less conser-

vative and more powerful than Bonfer-

roni's adjustment. Further, Hochberg's

method is uniformly more powerful

than Holm's method15. For example, if

there are three pairwise comparisons

with p = 0.045, 0.04 and 0.03, all would

be signicant with Hochberg's method

but none would be with Holm's method

(or Bonferroni's).

ere are many types of MCPs dis-

tinct from the t-test approaches de-

scribed above16. ese tests have "built-

in" familywise alpha protection that do

not require "manual" adjustment of al-

pha. Most of these MCPs calculate a so-

called q value for each comparison that

takes into account group mean dier-

ences, group variances, and group sam-

ple sizes in a fashion similar but not

identical to the calculation of t. is q

value is compared to a critical value gen-

erated from a q distribution (a distribu-

tion of dierences in sample means).

Protection from familywise alpha ina-

tion is in the form of a multiplier applied

to the critical value. e multiplier in-

creases as the number of comparisons

increases, thereby requiring greater dif-

ferences between group means to attain

statistical signicance as the number of

comparisons is increased. Some MCPs

are better than others at balancing statis-

tical power and Type 1 errors. By general

consensus amongst statisticians, the

Fisher Least Signicant Dierence

(LSD) test and the Duncan's Multiple

Range Test are considered to be overly

powerful, with too high a likelihood of

Type 1 errors (false positives). e

Scheè test is considered to be overly

conservative, with too high a likelihood

of Type 2 errors (false negatives), but is

applicable when group sample sizes are

markedly unequal17. e Tukey Hon-

estly Signicant Dierence (HSD) is fa-

vored by many statisticians for its bal-

ance of statistical power and protection

from Type 1 errors. It is worth noting

that the power advantage of the Tukey

HSD test obtains only when all possible

pairwise comparisons are performed.

e Student-Newman-Keuls (SNK) test

statistic is computed identically to the

Tukey HSD, however the critical value is

determined dierently using a step-wise

approach, somewhat like the Holm

method described above for t-tests. is

makes the SNK test slightly more power-

ful than the Tukey HSD test. However,

an advantage of the Tukey HSD test is

that a variant called Tukey-Kramer HSD

test can be used with unbalanced sample

size designs, unlike the SNK test. e

Dunnett test is useful when planned

pairwise tests are restricted to one group

(e.g., a control group) being compared to

all other groups (e.g., treatment groups).

In summary, (i) the Tukey HSD and

Student-Newman-Keuls tests are rec-

ommended when performing all pair-

wise tests; (2) the Hochberg or Holm se-

quential alpha adjustments enhance the

power of multiple post hoc t-tests while

THE JOURNAL OF MANUAL & MANIPULATIVE THERAPY n VOLUME 17 n NUM BER 2 [E35]

An Al ys is o F VAr iAn Ce : Th e Fu nd Amen TAl Co n Cep Ts

maintaining control of familywise alpha;

and (3) the Dunnett test is preferred

when comparing one group to all other

groups.

Break from Tradition: Skip One-Factor

ANOVA and Proceed Directly to a MCP

us far, the conventional approach to

ANOVA and MCPs has been presented,

namely, run an ANOVA and if it is not

signicant, proceed no further; if the

ANOVA is signicant, then run MCPs to

determine which group means dier.

However, it has long been held by some

statisticians that in certain circum-

stances ANOVA can be skipped and that

an appropriate MCP is the only neces-

sary inferential test. To quote an inuen-

tial paper by Wilkinson et al18, the

ANOVA-followed-by-MCP approach

"is usually wrong for several reasons.

First, pairwise methods such as Tukey's

honestly signicant dierence proce-

dure were designed to control a family-

wise error rate based on the sample size

and number of comparisons. Preceding

them with an omnibus F test in a stage-

wise testing procedure defeats this de-

sign, making it unnecessarily conserva-

tive." Related to this perspective is the

fact that inferential discrepancies are

possible between ANOVA and MCPs, in

which one is statistically signicant and

the other is not. is can occur when p

values are near the boundary of alpha.

Each MCP has slightly dierent criteria

for statistical signicance (based on ei-

ther the t or q distribution), and all dier

slightly from the criteria of F scores

(based on the F distribution). An argu-

ment has also been put forth with respect

to performing pre-planned MCPs with-

out the need for a statistically signicant

ANOVA in clinical trials19. Nonetheless,

the convention remains to perform

ANOVA and then MCPs, but MCPs

alone are a statistically valid option.

ANOVA is especially warranted when

there are multiple factors, due to the abil-

ity of ANOVA to detect interactions.

Wilkinson et al.18 also reminds re-

searchers that it is rarely necessary to

perform all pairwise comparisons. Se-

lected pre-planned comparisons that are

driven by the research hypothesis, and

not a subcortical reex to perform every

conceivable pairwise comparison, will

reduce the number of extraneous pair-

wise comparisons and false positives,

and have the added benet of increasing

statistical power.

ANOVA Eect Size

Eect size is a unitless measure of the

magnitude of treatment eects20. For

ANOVA, there are two categories of ef-

fect size indices: (i) those based on pro-

portions of sum of squares (η2, partial η2,

ω2 ), and (ii) those based on a standard-

ized dierence between group means

(such as Cohen's d ) 21,22. e latter type of

eect size index is useful for power anal-

ysis, and will be discussed briey in the

next section. To an ever increasing de-

gree, peer review journals are requiring

the presentation of eect sizes with de-

scriptive summaries of data.

ere are three commonly used ef-

fect size indices that are based on pro-

portions of the familiar sum of squares

values that form the foundation of

ANOVA computations. e three indi-

ces are called eta squared (η2), partial eta

squared (partial η2), and omega squared

2 ). ese indices range in value from 0

(no eect) to 1 (maximal eect) because

they are proportions of variance. ese

indices typically yield dierent values

for eect size.

Eta squared (η2) is calculated as:

η2 = SSBetween Groups / SSTotal

e SSBetween Groups term pertains to the in-

dependent variable of interest, whereas

SSTotal is based on the entire data set. Spe-

cically, for a factorial ANOVA, SSTotal =

[SSBetween Groups for all factors + SSError + all

SSInteractions ]. As such, the magnitude of η2

for a given factor will be inuenced by

the number of other independent vari-

ables. For example, η2 will tend to be

larger in a one-factor design than in a

two-factor design because in the latter

the SSTotal term will be inated to include

sum of squares arising from the second

factor.

Partial eta squared (partial η2) is

calculated with respect to the sum of

squares associated with the factor of in-

terest, not the total sum of squares:

partial η2 = SSBetween Groups /

(SSBetween Groups + SSError)

As with the η2 calculation, the SSBetween

Groups numerator term for partial η 2 per-

tains to the independent variable of in-

terest. However, the denominator diers

from that of η2. e denominator for

partial η2 is not based on the entire data

set (SSTotal) but instead on only SSBetween

Groups and SSError for the factor being eval-

uated. For a one-factor ANOVA, the

sum of square terms are identical for η2

and partial η2, so the values are identical;

however, with factorial ANOVA the de-

nominator for partial η2 will always be

smaller. For this reason, partial η2 is al-

ways larger than η2 with factorial

ANOVA (unless a factor or interaction

has absolutely no eect, as in the case of

the interaction in Figure 4, for which

both η2 and partial η2 equal 0).

Omega squared (ω2) is based on an

estimation of the proportion of variance

in the underlying population, in contrast

to the η2 and partial η2 indices that are

based on proportions of variance in the

sample. For this reason, ω2 will always be

a smaller value than η2 and partial η2.

Application of ω2 is limited to between-

subjects designs (i.e., not repeated mea-

sures) with equal samples sizes in all

groups. Omega squared is calculated as

follows:

ω2 = [SSBetween Groups —(dfBetween Groups ) *

(MSError )] / (SSTotal + MSError)

In contrast to η2, which provides an up-

wardly biased estimate of eect size

when the sample size is small, ω2 calcu-

lates an unbiased estimate23.

e reader is cautioned that η2and

partial η2 are oen misreported in the

literature (e.g., η2 incorrectly reported as

partial η2 )24,25 . It is advisable to calculate

these values by hand using the formulae

shown above as a conrmation of the

output of statistical soware programs,

to ensure accurate reporting. Refer to

Figure 4 for sample calculations of these

three eect size indices for a two-factor

ANOVA.

e η2 and partial η2 indices have

distinctly dierent attributes. Whether a

given attribute is considered to be an ad-

vantage or disadvantage is a matter of

perspective and context. Some authors24

argue the merits of eta squared, whereas

others4 prefer partial eta squared. Nota-

[E36] THE JOURNAL OF MANUAL & MANIPULATIVE THERAPY n VOLUME 17 n NUMBER 2

An Al ys is o F VAr iAn Ce : Th e Fu nd Amen TAl Co n Cep Ts

ble issues pertaining to these indices

include:

(i) Proportion of variance: When

there is a statistically signicant main ef-

fect or interaction, both η2 and partial η2

(and ω2) can be interpreted in terms of

the percentage of variance accounted for

by the corresponding independent vari-

able, even though they will oen yield

dierent values for factorial ANOVAs.

So if η2 = 0.20 and partial η2 = 0.25 for a

given factor, these two eect size indices

indicate that the factor accounts for 20%

vs. 25%, respectively, of the total vari-

ability in the dependent variable scores.

(ii) Relative values: Since η2 is either

equal to (one-factor ANOVA) or less

than (factorial ANOVA) partial η2, the

η2 index is the more conservative mea-

sure of eect size. is can be viewed as

a positive or negative attribute.

(iii) Additivity: η2 is additive, but

partial η2 is not. Since η2 for each factor

is calculated in terms of the total sum of

squares, all the η2 for an ANOVA are ad-

ditive and sum to 1 (i.e., they sum to

equal the amount of variance in the de-

pendent variable that arises from the ef-

fects of all the independent variables). In

contrast, a factor's partial η2 is calculated

in terms of that factor's sum of squares

(not the total sum of squares), so on

mathematical grounds the individual

partial η2 from an ANOVA are not addi-

tive and do not necessarily sum to 1.

(iv) Eects of multiple factors: As the

number of factors increases, the propor-

tion of variance accounted for by each

factor will necessarily decrease. Accord-

ingly, η2 decreases in an associated way.

In contrast, partial η2 for each factor is

calculated within the sum of squares

variance metrics of that particular fac-

tor, and is not inuenced by the number

of other factors.

How Many Subjects?

e aim of any experimental design is to

have adequate statistical power to detect

dierences between groups that truly

exist. ere is no simple answer to the

question of how many subjects are

needed for statistical validity using

ANOVA. Typical standards are to design

a study with an alpha of 0.05 to have with

statistical power of at least 0.80 (i.e., 80%

FIGURE 4. Calculations of three dierent measures of eect size for a two-factor (T and G) ANOVA of data set

shown in Figure 2. e eect sizes shown are all based on proportions of sum of squares: eta squared (η2), partial η2, and omega

squared (ω2). Note the following: (i) e denominator sum of squares term will be larger for η2 than for partial η2 in a factorial

ANOVA, so η2 will be smaller than partial η2. (ii) Omega squared 2) is a population estimate, whereas η2 and partial η2 are sample

estimates, so ω2 will be smaller than both η2 and partial η2. (iii) e sum of all η2 equals 1, whereas the sum of all partial η2 does

not equal 1 (can be less than or greater than). Refer to text for further explanation of these attributes.

Sum Degrees Mean

Effect of Squares of freedom Squares η 2 partial η 2 ω 2

T 56 2 28 0.76 0.90 0.72

G 12 1 12 0.16 0.67 0.15

T x Gender 0 2 0 0.00 0.00 0.00

Error 6 6 1 0.08 ---- ----

Total 74 11 1.00 1.57

Sample calculations:

η2 = SSBetween Groups / SSTotal

η2 for T = 56 / 74 = 0.76 = accounts for 76% of total variability in DV scores.

η2 for G = 12 / 74 = 0.16 = accounts for 16% of total variability in DV scores.

η2 for T*G interaction = 0 / 4 = 0.00 = accounts for 0% of total variability in DV scores.

η2 for Error = 6 / 74 = 0.08 = accounts for 8% of total variability in DV scores.

Sum of all η2 = 100%

partial η2 = SSBetween Groups / (SSBetween Groups + SSError)

partial η2 for T = 56 / (56 + 6) = 0.90 = accounts for 90% of total variability in DV scores.

partial η2 for G = 12 / (12 + 6) = 0.67 = accounts for 67% of total variability in DV scores.

partial η2 for T*G interaction = 0 / (0 + 6) = 0.00 = accounts for 0% of total variability in DV scores.

Sum of all partial η2 ≠100%

ω2 = [SSBetween Groups —(df Between Groups) * (MSError)] / (SSTotal + MSError)

ω2 for T = [56—(2)(1)] / [74 + 1] = 54 / 75 = 0.72

ω2 for G = [12—(1)(1)] / [74 + 1] = 11 / 75 = 0.15

ω2 for T*G interaction = [0—(2)(1)] / [74 + 1] = 0.00

THE JOURNAL OF MANUAL & MANIPULATIVE THERAPY n VOLUME 17 n NUM BER 2 [E37]

An Al ys is o F VAr iAn Ce : Th e Fu nd Amen TAl Co n Cep Ts

chance of detecting dierences between

group means that truly exists; alterna-

tively, a 20% chance of committing a

Type 2 error). Statistical power will be a

function of eect size, sample size, and

the number of independent variables

and levels, among other things. Ade-

quate sample size is a critical design con-

sideration, and prospective (a priori)

power analysis is performed to estimate

the required sample size that will yield

the desired level of power in the inferen-

tial analysis aer data are collected. is

entails a prediction of group mean dif-

ferences and group standard deviations

in the yet-to-be collected. Specically,

the eect size index used for prospective

power analysis is based on a standard-

ized measure such as Cohen's d, which is

based on predicted dierences in group

means (statistical signal) divided by

standard deviation (statistical noise).

Being based on dierences instead of

proportions, the d eect size index is

scaled dierently than the η2, partial η2

and ω2 described above, and can exceed

a value of 1.

e prediction of an experiment's

eect size that is part of a prospective

power analysis is nothing more than an

estimate. is estimate can be based on

pilot study data, previously published

ndings, intuition or best guesses. A

guiding principle should be to select an

eect size that is deemed to be clinically

relevant.

e approach used in a prospective

power analysis is outlined below for the

simple case of a t-test with independent

groups and equal variance, in which the

eect size index is dene as:

d = dierence in group means / standard

deviation of both groups

e estimate of the appropriate number

of subjects in each group for the speci-

ed alpha and power is given by the fol-

lowing equation26:

NEstimated = 2 x [ (zα + zβ ) / d ] 2

in which:

zα is the z value for the specied alpha.

With an alpha = 0.05, zα = 1.96 (2

tail).

zβ is the z value for the specied beta

(risk of Type 2 error). Power = 1- β.

For β = 0.20 (power = 0.80), z β =

0.84 (1 tail).

As a computational example, if the

eect size d is predicted to be 1.0 (which

equates to a dierence between group

means of one standard deviation), then

for alpha = 0.05 and power = 0.80 the

appropriate sample size for both groups

would be:

NEstimated = 2 x [(1.96 + 0.84) / 1]2

= 2 x [2.80 / 1]2 = 2 x 2.82 = 16

For a smaller eect size, a larger sample

size is needed, e.g., N = 63 for an eect

size of 0.5. e reader is cautioned that

these sample sizes are estimates based

on guesses about the predicted eect

size; they do not guarantee statistical

signicance.

Prospective power analysis for

ANOVA is more complex than outlined

above for a simple t-test. ANOVAs can

have numerous levels within a factor,

multiple factors, and interactions, all of

which need to be accounted for in a

comprehensive power analysis. ese

complications raise the following cau-

tionary note: ANOVA power analysis

quickly devolves into a series of progres-

sively more wild guesses (instead of "es-

timates") of eect sizes as the number of

independent variables and possible in-

teractions increase26. It is oen advisable

focus a prospective power analysis for

ANOVA on one factor that is of primary

interest, so as simplify the power analy-

sis and reduce the amount of unjusti-

able guesses. e reader is referred to

statistical textbooks (such as references

22, 26, 27) for dierent approaches that

can be used for prospective power anal-

ysis for ANOVA designs. As a general

guideline, it is desirable for group sam-

ple sizes to be large enough to invoke the

central limit theorem in the statistical

analysis (>30 or so) and for there to be a

balanced design (equal sample sizes in

each group).

Finally, a retrospective (post hoc)

power analysis is warranted aer data

are collected. e aim is to determine

the statistical power of the study, based

on the eect size (not estimated, but cal-

culated directly from the data) and sam-

ple size. is is particularly relevant for

statistically non-signicant ndings,

since the non-signicance may have

been the result of inadequate statistical

power. e textbooks cited above, as

well as many others, also discuss the me-

chanics of how to perform retrospective

power analyzes.

Conclusion: Statistical Signicance

Should not be Confused with

Clinical Signicance

ANOVA is a useful statistical tool for

drawing inferential conclusions about

how one or more independent variables

inuences a parametric dependent vari-

able (outcome measure). It is imperative

to keep in mind that statistical signi-

cance does not necessarily correspond

to clinical signicance. e much sought

aer statistically signicant ANOVA p

value has only two purposes: to play a

role in the inferential decision as to

whether group means dier from each

other (rejection of Null hypothesis), and

to assign a probability of the risk of com-

mitting a Type 1 error if the Null hy-

pothesis is rejected. Statistically signi-

cant ANOVA and MCPs say nothing

about the magnitude of group mean dif-

ferences, other than that a dierence ex-

ists. A large sample size can produce

statistical signicance with small dier-

ences in group means; depending on the

outcome measure, these small dier-

ences may have little clinical signi-

cance. Assigning clinical signicance is

a judgment call that needs to take into

account the magnitude of the dier-

ences between groups, which is best as-

sessed by examination of eect sizes.

Statistical signicance plays the role of a

searchlight to detect group dierences,

whereas eect size is useful for judging

the clinical signicance of these dier-

ences.

REFERENCES

1. Wackerly DD, Mendenhall W III, Scheaer

RL. Mathematical Statistics with Applica-

tions. 6th ed. Pacic Grove, CA: Druxbury

Press, 2002.

2. Shapiro SS, Wilk MB. An analysis of vari-

ance test for normality (complete samples).

Biometrika 1965;52:591–611.

3. D'Agnostino RB. An omnibus test for nor-

mality of moderate and large size samples.

Biometrika 1971;58:341–348.

[E38] THE JOURNAL OF MANUAL & MANIPULATIVE THERAPY n VOLUME 17 n NUMBER 2

An Al ys is o F VAr iAn Ce : Th e Fu nd Amen TAl Co n Cep Ts

4. Tabachnick BG, Fidell LS. Using Multivari-

ate Statistics. 5th ed. New York: Pearson

Education, 2007.

5. Levene H. Robust tests for the equality of

variance test for normality. In Olkin I, ed.

Contributions to Probability and Statistics:

Essays in Honor of Harold Hotelling. Palo

Alto: Stanford University Press, 1960.

6. Brown MB, Forsythe AB. Robust tests for the

equality of variances. Journal of the Ameri-

can Statistical Association 1974;69:364–367.

7. Zar JH. Biostatistical Analysis. Upper Saddle

River, NJ: Prentice Hall, 1998.

8. Daniel WW. Biostatistics: A Foundation for

Analysis in the Health Sciences. 7th ed.

Hoboken, NJ: John Wiley & Sons, Inc., 1999.

9. Box GEP. Non-normality and tests on vari-

ances. Biometrika 1953;40:318–335.

10. Box GEP. Some theorems on quadratic

forms applied in the study of analysis of vari-

ance problems: I. Eect of inequality of vari-

ance in the one way classication. Annals of

Mathematical Statistics 1954;25:290–302.

11. Wilkinson L, Blank G, Gruber C. Desktop

Data Analysis with SYSTAT. Upper Saddle

River, New Jersey: Prentice Hall, 1996.

12. Šidàk Z. Rectangular condence region for

the means of multivariate normal distribu-

tions. Journal of the American Statistical As-

sociation 1967;62:626–633.

13. Holm S. A simple sequentially rejective mul-

tiple test procedure. Scandinavian Journal of

Statistics 1979;6:65–70.

14. Hochberg Y. A sharper Bonferroni proce-

dure for multiple tests of signicance.

Biometrika 1988; 7:800–802.

15. Huang Y. Hochberg's step-up method: Cut-

ting corners o Holm's step-down method.

Biometrika 2007;94:965–975.

16. Toothaker L. Multiple Comparisons for Re-

searchers. New York, NY: Sage Publications,

1991.

17. Cabral HJ. Multiple Comparisons Proce-

dures. Circulation 2008;117:698–701.

18. Wilkinson L and the Task Force on Statisti-

cal Inference. Statistical methods in psy-

chology journals. American Psychologist

1999;54:594–604.

19. D'Agostino RB, Massaro J, Kwan H, Cabral

H. Strategies for dealing with multiple treat-

ment comparisons in conrmatory clinical

trials. Drug Information Journal 1993;27:

625–641.

20. Cook C. Clinimetrics Corner: Use of eect

sizes in describing data. J Man Manip er

2008;16:E54–E5.

21. Cohen J. Eta-squared and partial eta-

squared in xed factor ANOVA designs.

Educational and Psychological Measurement

1973;33:107–112.

22. Cohen J. Statistical Power Analysis for the

Behavioral Sciences. 2nd ed. Hillsdale, NJ:

Lawrence Erlbaum, 1988.

23. Keppel G. Design and Analysis: A Research-

er's Handbook. 2nd ed. Englewood Clis, NJ:

Prentice Hall, 1982.

24. Levine TR, Hullett CR. Eta squared, partial

eta squared, and misreporting of eect size

in communication research. Human Com-

munication Research 2002;28:612–625.

25. Pierce CA, Block RA, Aguinis H. Caution-

ary note on reporting eta-squared values

from multifactor ANOVA designs. Educa-

tional and Psychological Measurement, 2004;

64:916–924.

26. Norman GR, Streiner DL. Biostatistics e

Bare Essentials. Hamilton, Ontario: B.C.

Decker Inc., 1998.

27. Portney LG, Watkins MP. Foundations of

Clinical Research. Applications to Practice.

3rd ed. Upper Saddle River, NJ: Pearson

Education Inc., 2009.

... To further illustrate whether there are significant differences in these sociodemographics attributes of the samples from the four cities, the one-way analysis of variance (ANOVA) is adopted to perform statistical test. One-way ANOVA is a statistical test for detecting differences in group means when there is one parametric dependent variable and one or more independent variables (Sawyer 2009). When multiple groups are compared, one-way ANOVA is able to determine whether there are significant differences among groups (Kao and Green 2008;Verma 2013). ...

  • Leiming Li
  • Yu Zhang

Carsharing represents an alternative to private vehicles and is becoming internationally recognized as a method of sustainable transportation. Compared with the United States and countries in Europe, carsharing services in China started later and were initially underwhelming. With the revival and popularity of the sharing economy, carsharing has been thriving in China in recent years but remains in an initial stage. Understanding the determinants of people's intentions to carshare is critical for the promotion of carsharing services. The theoretical framework of this research is an expanded version of the theory of planned behavior containing environmental concern. A questionnaire was created to empirically test the model and a total of 1165 valid surveys were collected in four new first-tier cities in China. The intention to use carsharing was found to be directly affected by attitude, subjective norm, and perceived behavioral control rather than environmental concern. However, people's environmental concern was verified as indirectly impacting their intentions to use carsharing through attitude, subjective norm, and perceived behavioral control. In addition, this study also tested the moderating effect of car ownership, age, gender and income by adopting a multi-group analysis. The results confirm the moderating effect of car ownership and gender on people's intention to use carsharing, revealing the differences that exist between people with private cars and those without as well as the differences between the male and female gender. The moderating effects of age and income on people's intention to use carsharing were found to be insignificant. These findings provide practical insights for carsharing organizations and transportation departments. The limitations of this study and suggestions for further research are also discussed.

... To determine the mean value for each cause factor, a one-way between-group analysis of variance (ANOVA) was chosen. Analysis of variance (ANOVA) is a statistical test for detecting differences in group means when there is one parametric dependent variable and one or more independent variable [39]. Prior to conducting the ANOVA, the assumption of normality and homogeneity of variables were tested. ...

  • Elikplim Afelete
  • Wooyong Jung

Design change is a common but significant problem in construction projects. Issues of delay, cost overruns, claims, and disputes in projects occur as a result. However, design change studies in the power-project area are less often discussed. As a result, the primary objective of this study was to identify important cause factors of design changes according to different power-project types in Ghana. Following a thorough assessment of the literature, 36 potential causes were identified, which were narrowed down by expert reviews to 30. In this study, power projects were classified into three categories: power plant, renewable, and distribution and transmission. The results indicate owner-related financial problems as the most important cause of design change for all three project types, followed by the second and third most significant in each of the categories, respectively: errors and omission in design and problems or unforeseen site conditions in power plant projects; deficient quality and quantity of resources and inflation and changes in interest and exchange rates in renewable projects; and problems or unforeseen site conditions and changes of plans in distribution and transmission projects. Based on the findings, power-project stakeholders are able to comprehend the dynamics of design change and develop effective design management strategies to reduce impact.

... However, in some situations, departure from one of these assumptions does not markedly affect conclusions based on F-test. For example, looking for exact normality is a bit of a red herring because, we also have the "Central Limit Theorem (CLT)" that says that if the errors are not normal but still identically and independently distributed then the distribution of the coefficients will approach normality as the sample size increases ( [2], [24], [27]). This is what make statistics doable because no real data set entered into the computer is perfectly normal. ...

... ANOVA is a statistical tool used to detect differences between experimental group means. (Sawyer, 2009). ...

The aim of this study is to test the adaptability of seven (Triticum durum Desf.) genotypes growing under semi-arid condition.The experimentation was carried at the Agricultural Experimental Station of Setif (Algeria), in a design of random blocks with three replications; the plant material used in this study consists of seven genotypes (Triticum durum Desf.). At maturity, the following parameters were measured grain yield; economic yield; above ground biomass; number of spikes per meter square; number of grain per spike; harvest index; thousand kernel weight and plant height. Number of days to heading was accounted from sown date. ANOVA show that genotype and crop season effect were highly significant with all traits studied. MBB and Boussellem genotypes were the most suitable genotype; they have a good ranking under both crop seasons according to their results for all traits studied. 2016/2017 was the best crop season; it recorded the best values of grain yield and major of traits studied. The interaction genotype X crop season was highly significant for all traits studied.The correlations among traits and under both crop season show that grain yield was significantly correlated with economic yield and number of grains per spike. A high significant correlation was observed between economic yield and above ground biomass under both crop seasons .Number of days to heading was significantly correlated with plant height under both crop season.

... ANOVA Table Table 11 below shows the results for Analysis of Variance (ANOVA) table which is important to assess the statistical significance of the results, also used to detect any differences between experimental group differences (two or more) means and whether to accept or reject the hypothesis presented in this study (Sawyer, 2009). This analysis tests whether the R2 shown in table 10 above is significantly greater than zero by seeing the value of Sig whether p≤ 0.05 in table 11 above. ...

  • Atheria Kurniawati
  • Michele Shivaanii
  • Jugindar Singh Jugindar Singh

The successful emergence of sharing economy has received so many attentions from both organizations and customers. This recent model of business is said by many to come in the nature of disruptive, especially within the tourism industry. From small to big and popular hotels suffer loss ever since its appearance. Moreover, the sudden boom of on-demand ride hailing services such as Grab becomes a threat to many traditional taxi companies that are believed to have operated and been successful for years before the footprints of Grab is vividly seen. It turned out that the culprit behind the success of this sharing economy business model is the utilization of technology to powerfully connect providers and customers seamlessly. Especially, when the scale of internet penetration and technology development is crawling in a lightning speed. Also, not only that, many claimed sharing economy to be more affordable, convenient and sustainable to the environment. These three just pack the right combo to succeed in the market in no time. As interesting as it is, this study will research the three prominent factors in sharing economy including economic, social and environmental factor to see if they influence customers satisfaction or if the increase of customers in sharing economy is just a form of customers' curiosity towards a brand innovation.

... Analysis of Variance (ANOVA) helps to examine any potential difference among the grouped variables (Sawyer, 2009). In this context, the current study also involved One-Way Analysis of Variance based on only one Independent variable. ...

  • Salman Mohammad
  • Salman Mohmmed Abu Lehyeh Salman Mohmmed Abu Lehyeh
  • M-S@aau Edu Jo
  • Alnawafleh

Knowledge management (KM) is necessary for all to achieve competitive positions and advantages.Undoubtedly, the people who learn fast, can utilize knowledge gain predominance and excellence. This predominance can be achieved by the knowledge development through both preexisting or acquired knowledge. By keeping in view this fact, this research also examines the potential relationship between knowledge management, its components and, organizational learning. The researcher employed experimental approach and assess the proposed relationship by using the Structural Equation Modelling. Results revealed a strong, significant relationship between knowledge management, and e-Business(p≥ .022),knowledge adoption (p≥ .000), Technical Factors (p≥ .000), and Organizational Learning (p≥ .000). Results of Path Analysis also showed strong correlation between the proposed variables as their value ranged from .873 to .975. Thus, the study concluded that, knowledge management is a core component of achieving the organizational excellence. As the more employee are having knowledge, the more they can cope with the challenges. The researcher recommends more studies to examine this relationship between knowledge management and organizational wellbeing among all the levels of enterprise particularly in the United Arab Emirates to dig out more significant outcomes on knowledge management.

  • Sangeeta Jain
  • Rajesh Kumar Kumawat
  • Mratyunjay Rajkumar Gupta

Objectives Homoeopathic dilutions are used to increase active principles in medicinal plants, detoxify plants, increase plant growth rate and fruit production, improve plant metabolism and control diseases. This controlled experimental prospective study was conducted to evaluate the effect of homoeopathic medicines Zincum metallicum 6CH and Z. metallicum 12CH on plant growth of Abelmoschus esculentus L. in a natural environment. This study helps assess and establish the role of homoeopathy in propagating plant growth. Materials and Methods A. esculentus seeds were cultivated in a designated area of the Homoeopathy University campus. Among these, 30 received Zincum 6CH (20 drops in 1 litre water), while 30 received Zincum 12CH (20 drops in 1 L water) and 30 received normal water. After 60 days, the entire plant was measured for height, pod length and productivity. Results After 60 days, the number of fruits (plant productivity) in the groups receiving Zincum 6CH and 12CH was 335 and 267, respectively; in the group receiving normal water, the number of fruits was 159. The heights of plants receiving Zincum 6CH (M = 48.4 cm, SD = 2.65) and 12CH (M = 40.1 cm, SD = 2.39) were comparatively more than in plants receiving normal water (M = 31.6 cm SD = 2.26). The length of pods in plants receiving Zincum 6CH (M=13.3 cm, SD = 0.96) and 12CH (M = 10.3 cm, SD = 0.97) was comparatively more than in plants receiving normal water (M = 8.9 cm SD = 0.62). Conclusion The application of potentised homoeopathic medicines Zincum 6CH and 12CH on A. esculentus demonstrated a beneficial result, as observed through significant differences in plant productivity, mean plant height and mean pod length among the experimental and control groups. Zincum 6CH showed more efficacy than 12CH in all aspects of growth.

Analysis of variance is a procedure that examines the effect of one (or more) independent variable(s) on one (or more) dependent variable(s). For the independent variables, which are also called factors or treatments, only a nominal scaling is required, while the dependent variable (also called target variable) is scaled metrically. The analysis of variance is the most important multivariate method for the detection of mean differences across more than two groups and is thus particularly useful for the evaluation of experiments. The chapter deals with both the one-factorial (one dependent and one independent variable) and the two-factorial (one dependent and two independent variables) analysis of variance and extends the considerations in the case study to the analysis with two (nominally scaled) independent factors and two (metrically scaled) covariates. Furthermore, contrast analysis and post-hoc testing are also covered.

  • Rustam Musta
  • Laily Nurliana
  • Herianto Harbi
  • Siti Nurjana

Clove oil microencapsulated with maltodextrin as an antifungal agent against Candida albicans could produce different clear zones. A ratio of clove oil to maltodextrin of 1: 4 yielded clear zones in the medium group. Meanwhile, other ratios of 1: 6, 1: 8, 1:10, and 1:12, including the antifungal properties of clove oil, yielded clear zones in the strong category. The one-way ANOVA test results showed Fscore = 22.56 > Ftab(α = 0.01) = 5.06; indicating that at least one average clear zone diameter differed significantly from those of other treatment groups. HSD test shows that ratios of 1: 4 and 1: 6 obtained |Yi−Yj| > HSD, meaning that the coating has a negative effect. Meanwhile, in the ratios of 1:8; 1:10, and 1:12, the HSD test showed that |Yi−Yj| < HSD, meaning that the presence of maltodextrin did not affect. The increase in coating material was suspected to result in a more extensive distribution of clove oil in the coating material, thus increasing the interaction with Candida albicans.

Purpose Increased construction risk due to the complexity and numerous construction performance challenges requires improved construction project managers' competence. However, contextual interrogation of these competencies is limited. This paper aims to report on a study that sought to determine the expected competencies of construction project managers and assesses the existence of statistically significant differences due to gender, designation and educational levels of respondents. Design/methodology/approach A quantitative research design was instituted through the administration of a questionnaire survey on Project Managers, Architects, Engineers and Quantity surveyors working for consultants and contractors' organizations. Descriptive and inferential statistics analysed significant differences due to demographic variables. Factor analysis was also used to reveal interrelated significant sets of competencies expected of construction project managers. Findings Factor analysis determined 11 significant components with the highest-ranked components comprising organizational savvy and experience in managing project constraints. The univariate analysis determining effective communication, leadership and good team-building skills as being the three most critical expected competencies. Significant differences due to educational levels were established, with shortcomings existing in those with Diplomas. Research limitations/implications The higher education institutions need to establish curricula designs that align with the competency expectations. Mentorship programmes within construction organizations can also be significant in bridging the existing competence gap. However, due to the exploratory nature of the study, the insights of clients were not considered. Originality/value The study determined competencies for construction project managers and demographic-specific interventions.