QuestionQuestion

We are looking at regression with factors (categorical variables). We often treat them with what we call ‘dummy variables’ in econometrics. You will be showing you can use and interpret regressions with factors.
1. Load the LDC dataset in csv file given below. The dataset has two factors, size and area.
2. Write number 290742. Divide that number into three two-digit numbers, a, b, and c. If any of a, b, and c is greater than 72, divide it by 2. Delete observations numbered a, b, and c from the dataset. Show the command you used. The Assignment
3. Plot OMandA against the factors and reproduce the graphs. Explain what you get. What kind of graphs do you get? What do they tell you? Are the differences significant?
4. Regress OMandA on Area. Show the results in a nice table.
5. Why does only one area appear in the regression results? Explain how R chose which one to show.
6. Explain what the regression tells us?
7. Regress Total.Bill on Customers, OMandA, and Area . Show the results in a nice table.
8. You have just done a dummy-variable regression and found an intercept dummy, Explain what it tells you. Why is only one value of Area presented in the table?
9. For your last trick, you will find a slope dummy. To test to see if there is an interaction between the number of Customers and the Area add OMand:Area to the regression. Interpret the result.

Bonus:
10. Load library(effects)
11. define the model M<-lm(Total.Bill Size:OMandA). (This really just asks R to come up with a regression line for each of the categories. The colon is a shorthand way to indicate all the interactions.)
12. Plot the result using the effects package ( it is designed to show the different regressions when there are interaction all on one graph.) There are two versions
> plot(allEffects(M))
> plot(allEffects(M), multiline=T)
Explain the resulting graphs briefly.
Miscellaneous notes
f<-file.choose() d<-read.csv(f) d<-d[-c(1,14,21),] length(d[,2])

When you do a regression with a dummy, one category must be left out. (R will do it automatically. It leaves out the first factor in alphabetical order so take care. You may want to change the baseline case to make it easier to explain). You will have to think carefully about the meaning of your results.

Contents of CSV file:
Customers,OMandA,Dist.Cost,Total.Bill,Area,Size
Algoma,11581,839.00 ,56.15 ,132.47 ,N,S
Atikokan,1661,564.00 ,55.61 ,135.34 ,N,S
Brant,9741,490.00 ,36.34 ,114.13 ,S,S
CW,6496,299.00 ,33.35 ,110.82 ,S,S
Chapleau,1293,416.00 ,34.16 ,113.20 ,N,S
Coop Embrun,1954,274.00 ,33.90 ,112.26 ,S,S
ELK,11276,214.00 ,36.00 ,115.98 ,S,S
Espanola,3299,326.00 ,40.88 ,119.99 ,N,S
Fort Frances,3775,345.00 ,22.98 ,98.32 ,N,S
Grimsby,10307,202.00 ,36.82 ,114.82 ,S,S
Hearst,2817,308.00 ,25.52 ,103.07 ,N,S
Hydro 2000,1208,264.00 ,35.16 ,114.85 ,S,S
Hawkesbury,5521,165.00 ,22.21 ,99.66 ,S,S
Kenora,5572,359.00 ,36.94 ,114.45 ,N,S
Lakefront,9976,217.00 ,31.68 ,109.94 ,S,S
Lakeland,9598,293.00 ,41.16 ,119.56 ,N,S
Midland,6951,258.00 ,34.18 ,113.04 ,S,S
NOTL,8000,238.00 ,35.13 ,112.70 ,S,S
NOW,6059,353.00 ,34.23 ,111.70 ,N,S
Orangeville,11248,263.00 ,34.43 ,112.03 ,S,S
Ottawa River,10555,253.00 ,12.73 ,87.30 ,N,S
Parry Sound,3441,383.00 ,41.23 ,120.77 ,N,S
Renfrew,4183,269.00 ,27.07 ,106.29 ,S,S
Rideau SL,4185,275.00 ,37.60 ,117.46 ,S,S
Sioux Lookout,2755,425.00 ,45.00 ,123.79 ,N,S
Tillsonburg,6745,330.00 ,32.07 ,109.34 ,S,S
Wasaga,12324,180.00 ,30.96 ,110.59 ,S,S
Wellington North,3626,432.00 ,38.07 ,117.37 ,S,S
Blue Water,35772,309.00 ,40.98 ,117.81 ,S,M
Brantford,37964,176.00 ,28.79 ,106.07 ,S,M
Burlington,64329,225.00 ,34.33 ,111.50 ,S,M
Cambridge,51584,209.00 ,33.94 ,110.30 ,S,M
Chatham kent,32132,209.00 ,26.73 ,106.26 ,S,M
Collus,15723,259.00 ,33.48 ,110.80 ,S,M
CNP,15708,279.00 ,39.68 ,111.15 ,S,M
Enwin,85083,268.00 ,41.14 ,118.12 ,S,M
ErieThames,18090,315.00 ,40.56 ,118.16 ,S,M
Essex,28094,197.00 ,36.91 ,115.43 ,S,M
Festival,19885,200.00 ,38.25 ,114.75 ,S,M
Greater Sudbury,46748,280.00 ,33.90 ,111.91 ,N,M
Guelph,50859,251.00 ,34.71 ,110.54 ,S,M
Haldimand,21070,346.00 ,46.73 ,125.78 ,S,M
Halton,21232,227.00 ,32.40 ,110.92 ,S,M
Innisfil,14826,281.00 ,43.59 ,123.09 ,S,M
Kingston,26844,224.00 ,34.40 ,111.16 ,S,M
KitchenerWilmot,87964,155.00 ,29.60 ,106.19 ,S,M
Milton,30485,210.00 ,35.76 ,112.63 ,S,M
Newmarket,33338,198.00 ,37.98 ,115.17 ,S,M
NPEI,51162,275.00 ,36.75 ,114.98 ,S,M
Norfolk,19032,251.00 ,49.50 ,127.73 ,S,M
North Bay,23850,224.00 ,36.29 ,113.97 ,N,M
Oakville,63614,206.00 ,32.75 ,109.72 ,S,M
Orillia,13035,345.00 ,31.57 ,108.13 ,S,M
Oshawa,53083,191.00 ,26.63 ,103.97 ,S,M
Peterborough,35270,199.00 ,30.75 ,108.24 ,S,M
Sault Ste Marie,32998,260.00 ,26.65 ,100.16 ,N,M
St Thomas,16436,225.00 ,28.85 ,105.64 ,S,M
Thunder Bay,49765,238.00 ,26.29 ,103.75 ,N,M
Waterloo North,53611,182.00 ,35.30 ,112.47 ,S,M
Welland,21768,242.00 ,37.77 ,115.81 ,S,M
Westario,22257,207.00 ,28.91 ,108.70 ,S,M
Whitby,40337,214.00 ,38.19 ,115.69 ,S,M
Woodstock,15181,251.00 ,41.05 ,118.40 ,S,M
Enersource,195381,238.00 ,30.88 ,107.75 ,S,L
Horizon,235327,175.00 ,36.72 ,113.90 ,S,L
Hydro One Brampton,137856,148.00 ,30.67 ,107.46 ,S,L
Ottawa,305266,191.00 ,34.67 ,111.44 ,S,L
London,148331,209.00 ,34.83 ,112.03 ,S,L
Powerstream,332993,184.00 ,32.21 ,108.65 ,S,L
Veridian,113709,181.00 ,31.38 ,108.81 ,S,L
Hydro One Networks,1210695,454.00 ,51.42 ,131.34 ,N,XL
Toronto,709323,328.00 ,39.59 ,116.73 ,S,XL

Solution PreviewSolution Preview

This material may consist of step-by-step explanations on how to solve a problem or examples of proper writing, including the use of citations, references, bibliographies, and formatting. This material is made available for the sole purpose of studying and learning - misuse is strictly forbidden.

Code with explanations

#set working directory to the one where your file is saved
#
#Setting up
#
#
#NO 1
LDC <- read.csv("LDC.csv")
#
#NO 2
student_number <- 290742
#
#this one is a bit tricky, because the term that's used is 'TWO DIGIT NUMBERS'
#as I am using a string-to-int conversion on a substringed number, the variable
#b will only contain one digit, because it is a number variable
#so for example '07' is assigned as just 7
#
a <- strtoi(substr(student_number,1,2))
b <- strtoi(substr(student_number,3,4))
c <- strtoi(substr(student_number,5,6))
#
#function that is used for checking and dividing numbers
#we are using the floor function because we need to delete the exact observation
#numbered by 'num' so this variable has to be of type int
#this function rounds the division result to the closest bottom integer
#(in our case, not important, all variables are under 72)
#
numberDivide <- function(num) {
if (num>72) {
num = floor(num/2)
}
else {
num = num
}
}
#...
$30.00 for this solution

PayPal, G Pay, ApplePay, Amazon Pay, and all major credit cards accepted.

Find A Tutor

View available Statistics-R Programming Tutors

Get College Homework Help.

Are you sure you don't want to upload any files?

Fast tutor response requires as much info as possible.

Decision:
Upload a file
Continue without uploading

SUBMIT YOUR HOMEWORK
We couldn't find that subject.
Please select the best match from the list below.

We'll send you an email right away. If it's not in your inbox, check your spam folder.

  • 1
  • 2
  • 3
Live Chats