Transcribed TextTranscribed Text

1 ANOVA: One Way Analysis of Variance 5.1. Introduction Analysis of variance (ANOVA) is a parametric technique used to determine if three or more samples are drawn from a common population. Note that it is a test of differences of means, despite the name. As a test of differences, it is by default a two-tailed test. Comparing differences between samples requires two things: an estimate of the difference of means, and an estimate of inherent variability of the population under investigation. In the t test, the difference of means and the standard error were used, respectively. It is difficult to define a simple difference with multiple samples, so in ANOVA, the variance of sample means is used as an indicator of difference. Between-sample variance is obtained by defining the variance of the means of all the samples. The inherent variability of the population is derived from the pooled standard deviations of each sample set, and is called the within-sample variance. The separation of sources of variance can also be viewed as partitioning (dividing up) the corrected sum of squares. Variance is the standard deviation squared. So, knowing the definition of standard deviation (see Exercise 2), variance can be defined as the corrected sum of squares (SS) divided by (n-1). S2 = SS/ (n-1). The corrected or total sum of squares of all the data in the samples (SST) is the sum of two component parts; the within-sample sum of squares (SSW), and the between-sample sum of squares (SSb). The former is used to characterize the inherent variability of the population. The latter characterizes the difference between samples. The mechanism for comparison of โ€œdifferenceโ€ is to divide the sums of squares by their respective degrees of freedom to obtain variances, and to assess the ratio of two variances using the F test. Operationally, ANOVA is therefore a matter of determining the between and within sample corrected sum of squares and their respective degrees of freedom, computing respective variance and comparing the latter using an F test. 5.2. Notation In the layout that follows, the k separate samples have distinct columns indexed by j which runs from 1 to k. Within each sample, the individual values are indexed by the counter I which varies from 1 to nj , where nj is the number of individuals in the jth sample group. A single data value is indicated by xij. Intermediate and descriptive statistics are also identified with respective samples using the index j (e.g. nj , SS, ๐‘ฅฬ…j). In addition to sample statistics, there are grand statistics, such as the total sample size (N), the grand mean (๐‘ฅฬ…g), and the total sums (๏ƒฅ๏ƒฅx and ๏ƒฅ๏ƒฅx 2 ) Table 5.1. Notation used in ANOVA calculations Character Meaning Range xij Individual (ith) value within sample j 1-nj j Individual sample set within collection 1-k k Number of samples nj Number of values within jth sample N = ๏ƒฅnj Total number of values 2 5.3. Total Sum of Squares, SST The total corrected sum of squares is defined as: SST = ๏ƒฅ๏ƒฅxij 2 โ€“ (๏ƒฅ๏ƒฅxij) 2 /N and characterizes the entire dataset. The overall variance is the total corrected sum of squares divided by the overall degrees of freedom (๏ฎT = N-1). s 2 = SST/(N-1) 5.4. Within-sample variance, ๐‘ ๐‘ค 2 Within-sample variance is a measure of the variability of the population, based on the sample data. If ๐‘ ๐‘ค 2 is large, the differences between individual samples will not be as significant in indicating differences. ๐‘ ๐‘ค 2 is an aggregate standard deviation for all the sample sets weighted for sample size and is formally defined as: ๐‘ ๐‘ค 2 = โˆ‘ โˆ‘ (๐‘ฅ๐‘–๐‘—โˆ’๐‘ฅฬ…๐‘— ) ๐‘–=๐‘› 2 ๐‘–=1 ๐‘—=๐‘˜ ๐‘—=1 ๐‘โˆ’๐‘˜ Where n is the size of the kth sample, N is the total sample size, k is the number of samples, the right hand ๏ƒฅ indicates summation for all individuals in a sample, and the left hand ๏ƒฅ indicates summation of all sums for each sample. This formula is tedious to apply, so a faster method is used. ๐‘ ๐‘ค 2 can be considered as the within-sample corrected sum of squares divided by the within-sample degrees of freedom (๏ฎ2 = N-k). So, for ease of computation, calculate ๐‘ ๐‘ค 2 as the sum of the corrected sum of squares for each sample (๏ƒฅSSj) divided by N-k: ๐‘ ๐‘ค 2 = โˆ‘ (โˆ‘ ๐‘ฅ๐‘–๐‘— 2 โˆ’(โˆ‘ ๐‘ฅ๐‘–๐‘— ๐‘–=๐‘› ๐‘–=1 ) 2 /๐‘›๐‘— ๐‘–=๐‘› ๐‘–=1 ๐‘—=๐‘˜ ๐‘—=1 ๐‘โˆ’๐‘˜ = โˆ‘ (๐‘†๐‘†๐‘— ) ๐‘—=๐‘˜ ๐‘—=1 ๐‘โˆ’๐‘˜ 5.5. Between-sample variance, ๐‘ ๐‘ 2 The between-sample variance (๐‘ ๐‘ 2 ) compares the sample means (๐‘ฅฬ…j) with the grand mean (๐‘ฅฬ…g): ๐‘ฅฬ…๐‘— = โˆ‘ ๐‘ฅ๐‘–๐‘— ๐‘–=๐‘› ๐‘–=1 ๐‘›๐‘— , ๐‘ฅฬ…๐บ = โˆ‘ โˆ‘ ๐‘ฅ๐‘–๐‘— ๐‘–=๐‘› ๐‘–=1 ๐‘—=๐‘˜ ๐‘—=1 ๐‘ ๐‘ ๐‘ 2 between sample variance then is defined as: ๐‘ ๐‘ 2 = โˆ‘ ๐‘›๐‘— (๐‘ฅฬ…๐‘—โˆ’๐‘ฅฬ…๐บ) ๐‘—=๐‘˜ 2 ๐‘—=1 ๐‘˜โˆ’1 3 Compared to SSb, SST and SSw are easy to calculate from intermediate statistics, especially if the intermediate statistics are provided. But there is an easier alternative: SSb = SST โ€“ SSw. So, SSb can be quickly calculated from SST and SSw. This short-cut can be used either to check your calculation of SSb or, under time pressure, as a quick way of obtaining SSb. 5.6. The F test The ratio of the two variances is used to define an observed F statistic: Fobs = ๐‘ ๐‘ 2 ๐‘ ๐‘ค 2 Fobs can be compared to a critical value Fcrit, drawn from a table (Appendix 1). The table is selected based on ๏ก, and is read using ๏ฎ1 = k-1 (relating ๐‘ ๐‘ 2 , the numerator (top) value in defining F) across the top of the table, and ๏ฎ2 = N-k (relating ๐‘ ๐‘ค 2 , the denominator (bottom) value in defining F) read down the side of the table. These definitions are shown on your table to make sure you get the degrees of freedom the right way round. H0 is rejected if Fobs > Fcrit. Rejection implies that there is a significant difference between the means of the different samples. More detailed testing (e.g., a two sample t test) might be required to determine if a particular sample is significantly different from another. A simpler method is to compare the means ๏‚ฑ their respective standard deviations. 5.7. Example and Calculation The data sets are IQ scores for children drawn from four different junior schools. Are there significant differences in IQ with school? The START format is adopted for demonstration, but a more succinct summary method is provided at the end. a) Research Question: Does child IQ vary among schools? Assume that IQ I normally distributed. Test: To determine if mean IQ varies among schools, rather than at any particular school, the ANOVA test of difference of means will be used. There are k = 4 samples, with n1 = 7, n2 = 5, n3 = 6 and n4 = 6 HA: ๐‘ฅฬ…1 ๏‚น ๐‘ฅฬ…2 ๏‚น ๐‘ฅฬ…3 ๏‚น ๐‘ฅฬ…4 H0: ๐‘ฅฬ…1 = ๐‘ฅฬ…2 = ๐‘ฅฬ…3 = ๐‘ฅฬ…4 b) Significance Level: ๏ก = 0.05 c) Tailedness: Two-tailed. ANOVA is a test of difference, so it is always two tailed. Note, however, that the F test used to run ANOVA is one tailed as it determines if ๐‘ ๐‘ 2 > ๐‘ ๐‘ค 2 . d) Test Graphic and Results: 4 e) Critical Statistic: ๏ก = 0.05, two-tailed ๏ฎ1 (between) = k-1 = 4-1 = 3 ๏ฎ2 (within) = N-k = 24-4 = 20 Fcrit(0.05, 3,20) = 3.86 f) Observed Statistic: First, the data table is processed to provide intermediate and descriptive statistics: Table 5.2. Raw data and intermediate statistics for school IQ question j 1 2 3 4 k=4 i School A School B School C School D 1 105 87 118 84 2 102 93 95 85 3 113 102 88 87 4 96 98 101 96 5 86 95 94 88 6 97 - 92 94 7 101 - - - Grand Statistics nj 7 5 6 6 N = ๏ƒฅnj = 24 ๏ƒฅx 700 475 588 534 ๏ƒฅ๏ƒฅx = 2297 ๏ƒฅx 2 70,420 45,251 58,194 47,646 ๏ƒฅ๏ƒฅx 2 = 221,511 SSj 420 126 570 120 SSw = ๏ƒฅSSj = 1236 ๐‘ฅฬ…j 100 95 98 89 ๐‘ฅฬ…G = 95.7 nj(๐‘ฅฬ…j-๐‘ฅฬ…G) 2 129.43 2.45 31.74 269.34 SSb = 432.96 Second, the intermediate statistics are processed to provide the within sample and between sample variance estimates: 1. Calculation of within-sample variance, ๐‘ ๐‘ค 2 : ๐‘ ๐‘ค 2 = ๐‘†๐‘†๐‘ค ๐‘โˆ’๐‘˜ = โˆ‘ ๐‘†๐‘†๐‘— ๐‘โˆ’๐‘˜ = (SS1 + SS2 + SS3 + SS4) / (N-k) = (420 + 126 + 570 + 120) / (24 โ€“ 4) = 1236/20 = 61.8 2. Calculation of between-sample variance, ๐‘ ๐‘ 2 : ๐‘ ๐‘ 2 = โˆ‘ ๐‘›๐‘— (๐‘ฅฬ…๐‘—โˆ’๐‘ฅฬ…๐บ) ๐‘—=๐‘˜ 2 ๐‘—=1 ๐‘˜โˆ’1 = ๐‘†๐‘†๐‘ ๐‘˜โˆ’1 5 = (7 x (100 โ€“ 95.7)2 + 5 x (95-95.7)2 + 6 x (98 โ€“ 95.7)2 + 6 x (89 โ€“ 95.7)2)/(4-1) = (129.43 + 2.45 + 31.74 + 269.34) / 3 = 432.96 / 3 = 144.3 3. The Total Sum of Squares, SST: The total sum of squares can be used as a check on arithmetic, or in a hurry as a rapid method of obtaining SSb: SST = ๏ƒฅ๏ƒฅx 2 โ€“ (๏ƒฅx)2 /N = 221511 โ€“ 22972 /24 = 1669 SSb = SST - SSw = 1669 โ€“ 1236 = 433 This result agrees with the SSb calculated above. 4. The observed F statistic is calculated: Fobs = ๐‘ ๐‘ 2 ๐‘ ๐‘ค 2 = 144.32 / 61.8 = 2.34 5. The ANOVA Table: Table 5.3. The ANOVA table for IQ of children at different schools Source of Variation Sum of Squares Degrees of Freedom, ๏ฎ Mean Square = SS/๏ฎ = s2 Fobs Total SST = 1669 N-1 = 23 Within-sample SSw = 1236 N-k = 20 61.8 Between-sample SSb = 432.9 k-1 = 3 144.3 2.34 g) Test Result: Fobs (2.34) < Fcrit (3.86) therefore fail to reject the null hypothesis h) Check: i) Conclusions: There does not appear to be a significant difference in mean IQ among the schools. School D provides the only possible outlier in terms of the mean, and it might merit a two sample t test against the pooled data from the other schools. The sample size is probably too small for firm conclusions. 6 5.8. Shorter Reporting Methods In general, it is not necessary to report all the details of a test; it is simply assumed that you are knowledgeable enough to have undertaken them wisely. ANOVA reports are often much shorter than the START format. In general, it is necessary to build an intermediate and descriptive table, normally in a spreadsheet. The shorter report then would consist simply of a statement of the research question, the ANOVA table with the critical value of F incorporated (Table 5.3.b) and a conclusion Table 5.3. The ANOVA table for IQ of children at different schools Source of Variation Sum of Squares Degrees of Freedom, ๏ฎ Mean Square = SS/๏ฎ = s2 Fobs Total SST = 1669 N-1 = 23 Within-sample SSw = 1236 N-k = 2- 61.8 Between-sample SSb = 432.9 k-1 = 3 144.3 2.34 Fcrit (๏ก = 0.05) = 3.86 Often statistical analyses are provided in the body of text. For example, โ€œan analysis of variance suggests that there is little difference in IQ between schoolsโ€. Statements such as this are unacceptably superficial, and would be much stronger if some supporting information was incorporated. For example, โ€œAnalysis of variance showed no significant difference of IQ between four schools at the 0.05 level. However, the sample size in each school was very small (5 to 7) and only four schools were sampled.โ€ For the purposes of this course (GEOG 2210), a reasonably full accounting (i.e. START) of the method is required, as it is important to demonstrate your competence with the technique. A full report also provides the opportunity to garner marks even when the final answer may be incorrect. 7 1. Are waiters paid at comparable hourly wage ($) in different towns? Use the following dataset to tackle this question. Data collected by random sampling from three difference towns are summarized by intermediate and descriptive statistics. Identify the sample(s) most likely responsible for any significant difference. (Note: some additional descriptive statistics are provided to aid interpretation.) Dataset for Question 1 I j๏‚ฎ 1 2 3 k = 3 ๏‚ฏ Town A Town B Town C 1 5.42 9.84 4.73 2 11.76 6.77 6.33 . โ€ฆ โ€ฆ โ€ฆ . Table 5.4. Summary statistics, hourly wages of waiters Statistics (units) Town A Town B Town C Grand Statistics nj 15 23 19 N=๏ƒฅnj= ๏ƒฅxj ($) 123 229 96 ๏ƒฅ๏ƒฅx= ๏ƒฅxj 2 ($2 ) 1,237 2,979 742 ๏ƒฅ๏ƒฅx 2 = SSj ($2 ) 228.4 698.95 SSw= ๐‘‹ฬ… j ($) 8.20 9.956 ๐‘‹ฬ… G= $ sxj ($) 4.04 5.636 SST= 1,436.88 ๐‘‹ฬ… j + sx 12.24 15.59 ๐‘‹ฬ… j - sx 4.16 4.32 Nj (๐‘‹ฬ… j โ€“ ๐‘‹ฬ… G) 2 1.737 SSb= Table 5.5. ANOVA table for waiter income in three towns Source of Variation Sum of Squares Degrees of Freedom, ๏ฎ Mean Square = SS/๏ฎ = s2 Fobs Total SST = N-1 = Within-sample SSw = N-k = Between-sample SSb = k-1 = Fcrit (๏ก = ) = 8 2. The chemistry of water flowing from a series of five springs has been analyzed over a full year. The Magnesium ion concentrations in mg/l have been compiled and are summarized in the data table below (Table 5.6). Determine if there are systematic differences in the magnesium ion concentration among the springs. Examine the statistics sheet to aid in interpreting your results (See hints below). Complete your answers in Tables 5.6 and 5.7. Table 5.6. Summary statistics, Magnesium concentrations in springs. (all data are in mg/l) Statistics Spring 1 Spring 2 Spring 3 Spring 4 Spring 5 Grand Statistics nj 15 10 12 6 23 N=๏ƒฅnj= 66 ๏ƒฅxj 34.13 38.87 222.0 358.3 99.86 ๏ƒฅ๏ƒฅx= 753.1627 ๏ƒฅxj 2 106.28 169.41 9181.2 22048.43 477.65 ๏ƒฅ๏ƒฅx 2 = 31982.99 SSj 28.63 18.34 5074.1 651.9483 44.07 SSw= ๐‘‹ฬ… j 2.28 3.89 18.50 59.71667 4.34 ๐‘‹ฬ… G= sxj 1.43 1.43 21.48 11.41883 1.42 SST= 23388.23 ๐‘‹ฬ… j + sx 3.71 5.32 40.0 71.1 5.76 ๐‘‹ฬ… j - sx 0.85 2.46 -2.98 48.3 2.92 CV (%) 62.7 36.8 116.1 19.1 32.7 Nj (๐‘‹ฬ… j โ€“ ๐‘‹ฬ… G) 2 1252.07 SSb= Table 5.7. ANOVA table for waiter income in three towns Source of Variation Sum of Squares Degrees of Freedom, ๏ฎ Mean Square = SS/๏ฎ = s2 Fobs Total SST = 23388.23 N-1 = Within-sample SSw = N-k = Between-sample SSb = k-1 = Fcrit (๏ก = ) = To help interpret the patterns in the springs, classify the data in a table using the mean, median and standard deviation. Which springs are similar and which are different? Are the โ€œdifferentโ€ ones similar in any way? The exercise does not require specialised knowledge about springs. The approach might be equally applicable to a study on incomes or farm production.

Solution PreviewSolution Preview

These solutions may offer step-by-step problem-solving explanations or good writing examples that include modern styles of formatting and construction of bibliographies out of text citations and references. Students may use these solutions for personal skill-building and practice. Unethical use is strictly forbidden.

1. The result of ANOVA shows that the average hourly wage waiters are paid for are not the same in three towns. The observed F-statistics of 5.7779 is greater than the 5% significance critical value, which implies that the test rejected the null hypothesis that the average hourly wages for waiters in three towns are the same. Analysis of Variance itself does not tell us which town(s) are specifically deviated from other majority, but the casual inspection detects very low average wage for town C while average wages in town A and town B can be statistically indistinguishable....

By purchasing this solution you'll be able to access the following files:

for this solution

PayPal, G Pay, ApplePay, Amazon Pay, and all major credit cards accepted.

Find A Tutor

View available General Statistics Tutors

Get College Homework Help.

Are you sure you don't want to upload any files?

Fast tutor response requires as much info as possible.

Upload a file
Continue without uploading

We couldn't find that subject.
Please select the best match from the list below.

We'll send you an email right away. If it's not in your inbox, check your spam folder.

  • 1
  • 2
  • 3
Live Chats