We are looking at regression with factors (categorical variables). We often treat them with what we call ‘dummy variables’ in econometrics. You will be showing you can use and interpret regressions with factors.

1. Load the LDC dataset in csv file given below. The dataset has two factors, size and area.

2. Write number 290742. Divide that number into three two-digit numbers, a, b, and c. If any of a, b, and c is greater than 72, divide it by 2. Delete observations numbered a, b, and c from the dataset. Show the command you used. The Assignment

3. Plot OMandA against the factors and reproduce the graphs. Explain what you get. What kind of graphs do you get? What do they tell you? Are the differences significant?

4. Regress OMandA on Area. Show the results in a nice table.

5. Why does only one area appear in the regression results? Explain how R chose which one to show.

6. Explain what the regression tells us?

7. Regress Total.Bill on Customers, OMandA, and Area . Show the results in a nice table.

8. You have just done a dummy-variable regression and found an intercept dummy, Explain what it tells you. Why is only one value of Area presented in the table?

9. For your last trick, you will find a slope dummy. To test to see if there is an interaction between the number of Customers and the Area add OMand:Area to the regression. Interpret the result.

Bonus:

10. Load library(effects)

11. define the model M<-lm(Total.Bill Size:OMandA). (This really just asks R to come up with a regression line for each of the categories. The colon is a shorthand way to indicate all the interactions.)

12. Plot the result using the effects package ( it is designed to show the different regressions when there are interaction all on one graph.) There are two versions

> plot(allEffects(M))

> plot(allEffects(M), multiline=T)

Explain the resulting graphs briefly.

Miscellaneous notes

f<-file.choose() d<-read.csv(f) d<-d[-c(1,14,21),] length(d[,2])

When you do a regression with a dummy, one category must be left out. (R will do it automatically. It leaves out the first factor in alphabetical order so take care. You may want to change the baseline case to make it easier to explain). You will have to think carefully about the meaning of your results.

Contents of CSV file:

Customers,OMandA,Dist.Cost,Total.Bill,Area,Size

Algoma,11581,839.00 ,56.15 ,132.47 ,N,S

Atikokan,1661,564.00 ,55.61 ,135.34 ,N,S

Brant,9741,490.00 ,36.34 ,114.13 ,S,S

CW,6496,299.00 ,33.35 ,110.82 ,S,S

Chapleau,1293,416.00 ,34.16 ,113.20 ,N,S

Coop Embrun,1954,274.00 ,33.90 ,112.26 ,S,S

ELK,11276,214.00 ,36.00 ,115.98 ,S,S

Espanola,3299,326.00 ,40.88 ,119.99 ,N,S

Fort Frances,3775,345.00 ,22.98 ,98.32 ,N,S

Grimsby,10307,202.00 ,36.82 ,114.82 ,S,S

Hearst,2817,308.00 ,25.52 ,103.07 ,N,S

Hydro 2000,1208,264.00 ,35.16 ,114.85 ,S,S

Hawkesbury,5521,165.00 ,22.21 ,99.66 ,S,S

Kenora,5572,359.00 ,36.94 ,114.45 ,N,S

Lakefront,9976,217.00 ,31.68 ,109.94 ,S,S

Lakeland,9598,293.00 ,41.16 ,119.56 ,N,S

Midland,6951,258.00 ,34.18 ,113.04 ,S,S

NOTL,8000,238.00 ,35.13 ,112.70 ,S,S

NOW,6059,353.00 ,34.23 ,111.70 ,N,S

Orangeville,11248,263.00 ,34.43 ,112.03 ,S,S

Ottawa River,10555,253.00 ,12.73 ,87.30 ,N,S

Parry Sound,3441,383.00 ,41.23 ,120.77 ,N,S

Renfrew,4183,269.00 ,27.07 ,106.29 ,S,S

Rideau SL,4185,275.00 ,37.60 ,117.46 ,S,S

Sioux Lookout,2755,425.00 ,45.00 ,123.79 ,N,S

Tillsonburg,6745,330.00 ,32.07 ,109.34 ,S,S

Wasaga,12324,180.00 ,30.96 ,110.59 ,S,S

Wellington North,3626,432.00 ,38.07 ,117.37 ,S,S

Blue Water,35772,309.00 ,40.98 ,117.81 ,S,M

Brantford,37964,176.00 ,28.79 ,106.07 ,S,M

Burlington,64329,225.00 ,34.33 ,111.50 ,S,M

Cambridge,51584,209.00 ,33.94 ,110.30 ,S,M

Chatham kent,32132,209.00 ,26.73 ,106.26 ,S,M

Collus,15723,259.00 ,33.48 ,110.80 ,S,M

CNP,15708,279.00 ,39.68 ,111.15 ,S,M

Enwin,85083,268.00 ,41.14 ,118.12 ,S,M

ErieThames,18090,315.00 ,40.56 ,118.16 ,S,M

Essex,28094,197.00 ,36.91 ,115.43 ,S,M

Festival,19885,200.00 ,38.25 ,114.75 ,S,M

Greater Sudbury,46748,280.00 ,33.90 ,111.91 ,N,M

Guelph,50859,251.00 ,34.71 ,110.54 ,S,M

Haldimand,21070,346.00 ,46.73 ,125.78 ,S,M

Halton,21232,227.00 ,32.40 ,110.92 ,S,M

Innisfil,14826,281.00 ,43.59 ,123.09 ,S,M

Kingston,26844,224.00 ,34.40 ,111.16 ,S,M

KitchenerWilmot,87964,155.00 ,29.60 ,106.19 ,S,M

Milton,30485,210.00 ,35.76 ,112.63 ,S,M

Newmarket,33338,198.00 ,37.98 ,115.17 ,S,M

NPEI,51162,275.00 ,36.75 ,114.98 ,S,M

Norfolk,19032,251.00 ,49.50 ,127.73 ,S,M

North Bay,23850,224.00 ,36.29 ,113.97 ,N,M

Oakville,63614,206.00 ,32.75 ,109.72 ,S,M

Orillia,13035,345.00 ,31.57 ,108.13 ,S,M

Oshawa,53083,191.00 ,26.63 ,103.97 ,S,M

Peterborough,35270,199.00 ,30.75 ,108.24 ,S,M

Sault Ste Marie,32998,260.00 ,26.65 ,100.16 ,N,M

St Thomas,16436,225.00 ,28.85 ,105.64 ,S,M

Thunder Bay,49765,238.00 ,26.29 ,103.75 ,N,M

Waterloo North,53611,182.00 ,35.30 ,112.47 ,S,M

Welland,21768,242.00 ,37.77 ,115.81 ,S,M

Westario,22257,207.00 ,28.91 ,108.70 ,S,M

Whitby,40337,214.00 ,38.19 ,115.69 ,S,M

Woodstock,15181,251.00 ,41.05 ,118.40 ,S,M

Enersource,195381,238.00 ,30.88 ,107.75 ,S,L

Horizon,235327,175.00 ,36.72 ,113.90 ,S,L

Hydro One Brampton,137856,148.00 ,30.67 ,107.46 ,S,L

Ottawa,305266,191.00 ,34.67 ,111.44 ,S,L

London,148331,209.00 ,34.83 ,112.03 ,S,L

Powerstream,332993,184.00 ,32.21 ,108.65 ,S,L

Veridian,113709,181.00 ,31.38 ,108.81 ,S,L

Hydro One Networks,1210695,454.00 ,51.42 ,131.34 ,N,XL

Toronto,709323,328.00 ,39.59 ,116.73 ,S,XL

**Subject Mathematics Statistics-R Programming**