## Transcribed Text

1
ANOVA: One Way Analysis of Variance
5.1. Introduction
Analysis of variance (ANOVA) is a parametric technique used to determine if three or more samples
are drawn from a common population. Note that it is a test of differences of means, despite the name. As a test
of differences, it is by default a two-tailed test.
Comparing differences between samples requires two things: an estimate of the difference of means, and
an estimate of inherent variability of the population under investigation. In the t test, the difference of means
and the standard error were used, respectively. It is difficult to define a simple difference with multiple samples,
so in ANOVA, the variance of sample means is used as an indicator of difference. Between-sample variance
is obtained by defining the variance of the means of all the samples. The inherent variability of the population
is derived from the pooled standard deviations of each sample set, and is called the within-sample variance.
The separation of sources of variance can also be viewed as partitioning (dividing up) the corrected sum of
squares. Variance is the standard deviation squared. So, knowing the definition of standard deviation
(see Exercise 2), variance can be defined as the corrected sum of squares (SS) divided by (n-1). S2
= SS/
(n-1). The corrected or total sum of squares of all the data in the samples (SST) is the sum of two component
parts; the within-sample sum of squares (SSW), and the between-sample sum of squares (SSb). The former
is used to characterize the inherent variability of the population. The latter characterizes the difference
between samples. The mechanism for comparison of โdifferenceโ is to divide the sums of squares by their
respective degrees of freedom to obtain variances, and to assess the ratio of two variances using the F test.
Operationally, ANOVA is therefore a matter of determining the between and within sample corrected
sum of squares and their respective degrees of freedom, computing respective variance and comparing the
latter using an F test.
5.2. Notation
In the layout that follows, the k separate samples have distinct columns indexed by j which runs from 1 to
k. Within each sample, the individual values are indexed by the counter I which varies from 1 to nj
, where nj
is
the number of individuals in the jth sample group. A single data value is indicated by xij. Intermediate and
descriptive statistics are also identified with respective samples using the index j (e.g. nj
, SS, ๐ฅฬ
j). In addition to
sample statistics, there are grand statistics, such as the total sample size (N), the grand mean (๐ฅฬ
g), and the total
sums (๏ฅ๏ฅx and ๏ฅ๏ฅx
2
)
Table 5.1. Notation used in ANOVA calculations
Character Meaning Range
xij Individual (ith) value within sample j 1-nj
j Individual sample set within collection 1-k
k Number of samples
nj Number of values within jth sample
N = ๏ฅnj Total number of values
2
5.3. Total Sum of Squares, SST
The total corrected sum of squares is defined as:
SST = ๏ฅ๏ฅxij
2
โ (๏ฅ๏ฅxij)
2
/N
and characterizes the entire dataset. The overall variance is the total corrected sum of squares divided by the
overall degrees of freedom (๏ฎT = N-1).
s
2
= SST/(N-1)
5.4. Within-sample variance, ๐ ๐ค
2
Within-sample variance is a measure of the variability of the population, based on the sample data. If ๐ ๐ค
2
is
large, the differences between individual samples will not be as significant in indicating differences. ๐ ๐ค
2
is an
aggregate standard deviation for all the sample sets weighted for sample size and is formally defined as:
๐ ๐ค
2 =
โ โ (๐ฅ๐๐โ๐ฅฬ
๐
)
๐=๐ 2
๐=1
๐=๐
๐=1
๐โ๐
Where n is the size of the kth sample, N is the total sample size, k is the number of samples, the right hand ๏ฅ
indicates summation for all individuals in a sample, and the left hand ๏ฅ indicates summation of all sums for each
sample. This formula is tedious to apply, so a faster method is used.
๐ ๐ค
2
can be considered as the within-sample corrected sum of squares divided by the within-sample degrees
of freedom (๏ฎ2 = N-k). So, for ease of computation, calculate ๐ ๐ค
2
as the sum of the corrected sum of squares for
each sample (๏ฅSSj) divided by N-k:
๐ ๐ค
2 =
โ (โ ๐ฅ๐๐
2 โ(โ ๐ฅ๐๐
๐=๐
๐=1 )
2
/๐๐
๐=๐
๐=1
๐=๐
๐=1
๐โ๐
=
โ (๐๐๐
)
๐=๐
๐=1
๐โ๐
5.5. Between-sample variance, ๐ ๐
2
The between-sample variance (๐ ๐
2
) compares the sample means (๐ฅฬ
j) with the grand mean (๐ฅฬ
g):
๐ฅฬ
๐ =
โ ๐ฅ๐๐
๐=๐
๐=1
๐๐
, ๐ฅฬ
๐บ =
โ โ ๐ฅ๐๐
๐=๐
๐=1
๐=๐
๐=1
๐
๐ ๐
2
between sample variance then is defined as:
๐ ๐
2 =
โ ๐๐
(๐ฅฬ
๐โ๐ฅฬ
๐บ)
๐=๐ 2
๐=1
๐โ1
3
Compared to SSb, SST and SSw are easy to calculate from intermediate statistics, especially if the
intermediate statistics are provided. But there is an easier alternative: SSb = SST โ SSw. So, SSb can be quickly
calculated from SST and SSw. This short-cut can be used either to check your calculation of SSb or, under time
pressure, as a quick way of obtaining SSb.
5.6. The F test
The ratio of the two variances is used to define an observed F statistic:
Fobs =
๐ ๐
2
๐ ๐ค
2
Fobs can be compared to a critical value Fcrit, drawn from a table (Appendix 1). The table is selected based on
๏ก, and is read using ๏ฎ1 = k-1 (relating ๐ ๐
2
, the numerator (top) value in defining F) across the top of the table, and
๏ฎ2 = N-k (relating ๐ ๐ค
2
, the denominator (bottom) value in defining F) read down the side of the table. These
definitions are shown on your table to make sure you get the degrees of freedom the right way round.
H0 is rejected if Fobs > Fcrit. Rejection implies that there is a significant difference between the means of the
different samples. More detailed testing (e.g., a two sample t test) might be required to determine if a particular
sample is significantly different from another. A simpler method is to compare the means ๏ฑ their respective
standard deviations.
5.7. Example and Calculation
The data sets are IQ scores for children drawn from four different junior schools. Are there significant
differences in IQ with school? The START format is adopted for demonstration, but a more succinct summary
method is provided at the end.
a) Research Question: Does child IQ vary among schools? Assume that IQ I normally distributed.
Test: To determine if mean IQ varies among schools, rather than at any particular school, the ANOVA test of
difference of means will be used. There are k = 4 samples, with n1 = 7, n2 = 5, n3 = 6 and n4 = 6
HA: ๐ฅฬ
1 ๏น ๐ฅฬ
2 ๏น ๐ฅฬ
3 ๏น ๐ฅฬ
4
H0: ๐ฅฬ
1 = ๐ฅฬ
2 = ๐ฅฬ
3 = ๐ฅฬ
4
b) Significance Level: ๏ก = 0.05
c) Tailedness: Two-tailed. ANOVA is a test of difference, so it is always two tailed. Note, however, that the F
test used to run ANOVA is one tailed as it determines if ๐ ๐
2
> ๐ ๐ค
2
.
d) Test Graphic and Results:
4
e) Critical Statistic:
๏ก = 0.05, two-tailed
๏ฎ1 (between) = k-1 = 4-1 = 3
๏ฎ2 (within) = N-k = 24-4 = 20
Fcrit(0.05, 3,20) = 3.86
f) Observed Statistic:
First, the data table is processed to provide intermediate and descriptive statistics:
Table 5.2. Raw data and intermediate statistics for school IQ question
j 1 2 3 4 k=4
i School A School B School C School D
1 105 87 118 84
2 102 93 95 85
3 113 102 88 87
4 96 98 101 96
5 86 95 94 88
6 97 - 92 94
7 101 - - - Grand Statistics
nj 7 5 6 6 N = ๏ฅnj = 24
๏ฅx 700 475 588 534 ๏ฅ๏ฅx = 2297
๏ฅx
2
70,420 45,251 58,194 47,646 ๏ฅ๏ฅx
2
= 221,511
SSj 420 126 570 120 SSw = ๏ฅSSj = 1236
๐ฅฬ
j 100 95 98 89 ๐ฅฬ
G = 95.7
nj(๐ฅฬ
j-๐ฅฬ
G)
2
129.43 2.45 31.74 269.34 SSb = 432.96
Second, the intermediate statistics are processed to provide the within sample and between sample
variance estimates:
1. Calculation of within-sample variance, ๐ ๐ค
2
:
๐ ๐ค
2 =
๐๐๐ค
๐โ๐
=
โ ๐๐๐
๐โ๐
= (SS1 + SS2 + SS3 + SS4) / (N-k)
= (420 + 126 + 570 + 120) / (24 โ 4)
= 1236/20
= 61.8
2. Calculation of between-sample variance, ๐ ๐
2
:
๐ ๐
2 =
โ ๐๐
(๐ฅฬ
๐โ๐ฅฬ
๐บ)
๐=๐ 2
๐=1
๐โ1
=
๐๐๐
๐โ1
5
= (7 x (100 โ 95.7)2 + 5 x (95-95.7)2 + 6 x (98 โ 95.7)2 + 6 x (89 โ 95.7)2)/(4-1)
= (129.43 + 2.45 + 31.74 + 269.34) / 3
= 432.96 / 3
= 144.3
3. The Total Sum of Squares, SST:
The total sum of squares can be used as a check on arithmetic, or in a hurry as a rapid method of obtaining
SSb:
SST = ๏ฅ๏ฅx
2
โ (๏ฅx)2
/N = 221511 โ 22972
/24 = 1669
SSb = SST - SSw = 1669 โ 1236 = 433
This result agrees with the SSb calculated above.
4. The observed F statistic is calculated:
Fobs =
๐ ๐
2
๐ ๐ค
2
= 144.32 / 61.8
= 2.34
5. The ANOVA Table:
Table 5.3. The ANOVA table for IQ of children at different schools
Source of Variation Sum of Squares Degrees of Freedom, ๏ฎ Mean Square = SS/๏ฎ
= s2
Fobs
Total SST = 1669 N-1 = 23
Within-sample SSw = 1236 N-k = 20 61.8
Between-sample SSb = 432.9 k-1 = 3 144.3 2.34
g) Test Result:
Fobs (2.34) < Fcrit (3.86) therefore fail to reject the null hypothesis
h) Check:
i) Conclusions:
There does not appear to be a significant difference in mean IQ among the schools. School D provides the only
possible outlier in terms of the mean, and it might merit a two sample t test against the pooled data from the
other schools. The sample size is probably too small for firm conclusions.
6
5.8. Shorter Reporting Methods
In general, it is not necessary to report all the details of a test; it is simply assumed that you are
knowledgeable enough to have undertaken them wisely. ANOVA reports are often much shorter than the START
format. In general, it is necessary to build an intermediate and descriptive table, normally in a spreadsheet. The
shorter report then would consist simply of a statement of the research question, the ANOVA table with the
critical value of F incorporated (Table 5.3.b) and a conclusion
Table 5.3. The ANOVA table for IQ of children at different schools
Source of Variation Sum of Squares Degrees of Freedom, ๏ฎ Mean Square = SS/๏ฎ
= s2
Fobs
Total SST = 1669 N-1 = 23
Within-sample SSw = 1236 N-k = 2- 61.8
Between-sample SSb = 432.9 k-1 = 3 144.3 2.34
Fcrit (๏ก = 0.05) = 3.86
Often statistical analyses are provided in the body of text. For example, โan analysis of variance suggests
that there is little difference in IQ between schoolsโ. Statements such as this are unacceptably superficial, and
would be much stronger if some supporting information was incorporated. For example, โAnalysis of variance
showed no significant difference of IQ between four schools at the 0.05 level. However, the sample size in each
school was very small (5 to 7) and only four schools were sampled.โ
For the purposes of this course (GEOG 2210), a reasonably full accounting (i.e. START) of the method is
required, as it is important to demonstrate your competence with the technique. A full report also provides the
opportunity to garner marks even when the final answer may be incorrect.
7
1. Are waiters paid at comparable hourly wage ($) in different towns? Use the following dataset to tackle this
question. Data collected by random sampling from three difference towns are summarized by intermediate and
descriptive statistics. Identify the sample(s) most likely responsible for any significant difference. (Note: some
additional descriptive statistics are provided to aid interpretation.)
Dataset for Question 1
I j๏ฎ 1 2 3 k = 3
๏ฏ Town A Town B Town C
1 5.42 9.84 4.73
2 11.76 6.77 6.33
. โฆ โฆ โฆ
.
Table 5.4. Summary statistics, hourly wages of waiters
Statistics
(units) Town A Town B Town C Grand Statistics
nj 15 23 19 N=๏ฅnj=
๏ฅxj ($) 123 229 96 ๏ฅ๏ฅx=
๏ฅxj
2
($2
) 1,237 2,979 742 ๏ฅ๏ฅx
2
=
SSj ($2
) 228.4 698.95 SSw=
๐ฬ
j ($) 8.20 9.956 ๐ฬ
G= $
sxj ($) 4.04 5.636 SST= 1,436.88
๐ฬ
j + sx 12.24 15.59
๐ฬ
j - sx 4.16 4.32
Nj (๐ฬ
j โ ๐ฬ
G)
2
1.737 SSb=
Table 5.5. ANOVA table for waiter income in three towns
Source of Variation Sum of Squares Degrees of Freedom, ๏ฎ Mean Square = SS/๏ฎ = s2
Fobs
Total SST = N-1 =
Within-sample SSw = N-k =
Between-sample SSb = k-1 =
Fcrit (๏ก = ) =
8
2. The chemistry of water flowing from a series of five springs has been analyzed over a full year. The Magnesium ion
concentrations in mg/l have been compiled and are summarized in the data table below (Table 5.6). Determine if
there are systematic differences in the magnesium ion concentration among the springs. Examine the statistics
sheet to aid in interpreting your results (See hints below). Complete your answers in Tables 5.6 and 5.7.
Table 5.6. Summary statistics, Magnesium concentrations in springs. (all data are in mg/l)
Statistics Spring 1 Spring 2 Spring 3 Spring 4 Spring 5 Grand Statistics
nj 15 10 12 6 23 N=๏ฅnj= 66
๏ฅxj 34.13 38.87 222.0 358.3 99.86 ๏ฅ๏ฅx= 753.1627
๏ฅxj
2
106.28 169.41 9181.2 22048.43 477.65 ๏ฅ๏ฅx
2
= 31982.99
SSj 28.63 18.34 5074.1 651.9483 44.07 SSw=
๐ฬ
j 2.28 3.89 18.50 59.71667 4.34 ๐ฬ
G=
sxj 1.43 1.43 21.48 11.41883 1.42 SST= 23388.23
๐ฬ
j + sx 3.71 5.32 40.0 71.1 5.76
๐ฬ
j - sx 0.85 2.46 -2.98 48.3 2.92
CV (%) 62.7 36.8 116.1 19.1 32.7
Nj (๐ฬ
j โ ๐ฬ
G)
2
1252.07 SSb=
Table 5.7. ANOVA table for waiter income in three towns
Source of Variation Sum of Squares Degrees of Freedom, ๏ฎ Mean Square = SS/๏ฎ = s2
Fobs
Total SST = 23388.23 N-1 =
Within-sample SSw = N-k =
Between-sample SSb = k-1 =
Fcrit (๏ก = ) =
To help interpret the patterns in the springs, classify the data in a table using the mean, median and standard deviation.
Which springs are similar and which are different? Are the โdifferentโ ones similar in any way? The exercise does not
require specialised knowledge about springs. The approach might be equally applicable to a study on incomes or farm
production.

These solutions may offer step-by-step problem-solving explanations or good writing examples that include modern styles of formatting and construction
of bibliographies out of text citations and references. Students may use these solutions for personal skill-building and practice.
Unethical use is strictly forbidden.

1. The result of ANOVA shows that the average hourly wage waiters are paid for are not the same in three towns. The observed F-statistics of 5.7779 is greater than the 5% significance critical value, which implies that the test rejected the null hypothesis that the average hourly wages for waiters in three towns are the same. Analysis of Variance itself does not tell us which town(s) are specifically deviated from other majority, but the casual inspection detects very low average wage for town C while average wages in town A and town B can be statistically indistinguishable....