Question

We are looking at regression with factors (categorical variables). We often treat them with what we call ‘dummy variables’ in econometrics. You will be showing you can use and interpret regressions with factors.
1. Load the LDC dataset in csv file given below. The dataset has two factors, size and area.
2. Write number 290742. Divide that number into three two-digit numbers, a, b, and c. If any of a, b, and c is greater than 72, divide it by 2. Delete observations numbered a, b, and c from the dataset. Show the command you used. The Assignment
3. Plot OMandA against the factors and reproduce the graphs. Explain what you get. What kind of graphs do you get? What do they tell you? Are the differences significant?
4. Regress OMandA on Area. Show the results in a nice table.
5. Why does only one area appear in the regression results? Explain how R chose which one to show.
6. Explain what the regression tells us?
7. Regress Total.Bill on Customers, OMandA, and Area . Show the results in a nice table.
8. You have just done a dummy-variable regression and found an intercept dummy, Explain what it tells you. Why is only one value of Area presented in the table?
9. For your last trick, you will find a slope dummy. To test to see if there is an interaction between the number of Customers and the Area add OMand:Area to the regression. Interpret the result.

Bonus:
10. Load library(effects)
11. define the model M<-lm(Total.Bill Size:OMandA). (This really just asks R to come up with a regression line for each of the categories. The colon is a shorthand way to indicate all the interactions.)
12. Plot the result using the effects package ( it is designed to show the different regressions when there are interaction all on one graph.) There are two versions
> plot(allEffects(M))
> plot(allEffects(M), multiline=T)
Explain the resulting graphs briefly.
Miscellaneous notes
f<-file.choose() d<-read.csv(f) d<-d[-c(1,14,21),] length(d[,2])

When you do a regression with a dummy, one category must be left out. (R will do it automatically. It leaves out the first factor in alphabetical order so take care. You may want to change the baseline case to make it easier to explain). You will have to think carefully about the meaning of your results.

Contents of CSV file:
Customers,OMandA,Dist.Cost,Total.Bill,Area,Size
Algoma,11581,839.00 ,56.15 ,132.47 ,N,S
Atikokan,1661,564.00 ,55.61 ,135.34 ,N,S
Brant,9741,490.00 ,36.34 ,114.13 ,S,S
CW,6496,299.00 ,33.35 ,110.82 ,S,S
Chapleau,1293,416.00 ,34.16 ,113.20 ,N,S
Coop Embrun,1954,274.00 ,33.90 ,112.26 ,S,S
ELK,11276,214.00 ,36.00 ,115.98 ,S,S
Espanola,3299,326.00 ,40.88 ,119.99 ,N,S
Fort Frances,3775,345.00 ,22.98 ,98.32 ,N,S
Grimsby,10307,202.00 ,36.82 ,114.82 ,S,S
Hearst,2817,308.00 ,25.52 ,103.07 ,N,S
Hydro 2000,1208,264.00 ,35.16 ,114.85 ,S,S
Hawkesbury,5521,165.00 ,22.21 ,99.66 ,S,S
Kenora,5572,359.00 ,36.94 ,114.45 ,N,S
Lakefront,9976,217.00 ,31.68 ,109.94 ,S,S
Lakeland,9598,293.00 ,41.16 ,119.56 ,N,S
Midland,6951,258.00 ,34.18 ,113.04 ,S,S
NOTL,8000,238.00 ,35.13 ,112.70 ,S,S
NOW,6059,353.00 ,34.23 ,111.70 ,N,S
Orangeville,11248,263.00 ,34.43 ,112.03 ,S,S
Ottawa River,10555,253.00 ,12.73 ,87.30 ,N,S
Parry Sound,3441,383.00 ,41.23 ,120.77 ,N,S
Renfrew,4183,269.00 ,27.07 ,106.29 ,S,S
Rideau SL,4185,275.00 ,37.60 ,117.46 ,S,S
Sioux Lookout,2755,425.00 ,45.00 ,123.79 ,N,S
Tillsonburg,6745,330.00 ,32.07 ,109.34 ,S,S
Wasaga,12324,180.00 ,30.96 ,110.59 ,S,S
Wellington North,3626,432.00 ,38.07 ,117.37 ,S,S
Blue Water,35772,309.00 ,40.98 ,117.81 ,S,M
Brantford,37964,176.00 ,28.79 ,106.07 ,S,M
Burlington,64329,225.00 ,34.33 ,111.50 ,S,M
Cambridge,51584,209.00 ,33.94 ,110.30 ,S,M
Chatham kent,32132,209.00 ,26.73 ,106.26 ,S,M
Collus,15723,259.00 ,33.48 ,110.80 ,S,M
CNP,15708,279.00 ,39.68 ,111.15 ,S,M
Enwin,85083,268.00 ,41.14 ,118.12 ,S,M
ErieThames,18090,315.00 ,40.56 ,118.16 ,S,M
Essex,28094,197.00 ,36.91 ,115.43 ,S,M
Festival,19885,200.00 ,38.25 ,114.75 ,S,M
Greater Sudbury,46748,280.00 ,33.90 ,111.91 ,N,M
Guelph,50859,251.00 ,34.71 ,110.54 ,S,M
Haldimand,21070,346.00 ,46.73 ,125.78 ,S,M
Halton,21232,227.00 ,32.40 ,110.92 ,S,M
Innisfil,14826,281.00 ,43.59 ,123.09 ,S,M
Kingston,26844,224.00 ,34.40 ,111.16 ,S,M
KitchenerWilmot,87964,155.00 ,29.60 ,106.19 ,S,M
Milton,30485,210.00 ,35.76 ,112.63 ,S,M
Newmarket,33338,198.00 ,37.98 ,115.17 ,S,M
NPEI,51162,275.00 ,36.75 ,114.98 ,S,M
Norfolk,19032,251.00 ,49.50 ,127.73 ,S,M
North Bay,23850,224.00 ,36.29 ,113.97 ,N,M
Oakville,63614,206.00 ,32.75 ,109.72 ,S,M
Orillia,13035,345.00 ,31.57 ,108.13 ,S,M
Oshawa,53083,191.00 ,26.63 ,103.97 ,S,M
Peterborough,35270,199.00 ,30.75 ,108.24 ,S,M
Sault Ste Marie,32998,260.00 ,26.65 ,100.16 ,N,M
St Thomas,16436,225.00 ,28.85 ,105.64 ,S,M
Thunder Bay,49765,238.00 ,26.29 ,103.75 ,N,M
Waterloo North,53611,182.00 ,35.30 ,112.47 ,S,M
Welland,21768,242.00 ,37.77 ,115.81 ,S,M
Westario,22257,207.00 ,28.91 ,108.70 ,S,M
Whitby,40337,214.00 ,38.19 ,115.69 ,S,M
Woodstock,15181,251.00 ,41.05 ,118.40 ,S,M
Enersource,195381,238.00 ,30.88 ,107.75 ,S,L
Horizon,235327,175.00 ,36.72 ,113.90 ,S,L
Hydro One Brampton,137856,148.00 ,30.67 ,107.46 ,S,L
Ottawa,305266,191.00 ,34.67 ,111.44 ,S,L
London,148331,209.00 ,34.83 ,112.03 ,S,L
Powerstream,332993,184.00 ,32.21 ,108.65 ,S,L
Veridian,113709,181.00 ,31.38 ,108.81 ,S,L
Hydro One Networks,1210695,454.00 ,51.42 ,131.34 ,N,XL
Toronto,709323,328.00 ,39.59 ,116.73 ,S,XL

Solution Preview

This material may consist of step-by-step explanations on how to solve a problem or examples of proper writing, including the use of citations, references, bibliographies, and formatting. This material is made available for the sole purpose of studying and learning - misuse is strictly forbidden.

Code with explanations

#set working directory to the one where your file is saved
#
#Setting up
#
#
#NO 1
LDC <- read.csv("LDC.csv")
#
#NO 2
student_number <- 290742
#
#this one is a bit tricky, because the term that's used is 'TWO DIGIT NUMBERS'
#as I am using a string-to-int conversion on a substringed number, the variable
#b will only contain one digit, because it is a number variable
#so for example '07' is assigned as just 7
#
a <- strtoi(substr(student_number,1,2))
b <- strtoi(substr(student_number,3,4))
c <- strtoi(substr(student_number,5,6))
#
#function that is used for checking and dividing numbers
#we are using the floor function because we need to delete the exact observation
#numbered by 'num' so this variable has to be of type int
#this function rounds the division result to the closest bottom integer
#(in our case, not important, all variables are under 72)
#
numberDivide <- function(num) {
if (num>72) {
num = floor(num/2)
}
else {
num = num
}
}
#...

This is only a preview of the solution. Please use the purchase button to see the entire solution

Related Homework Solutions

R Programming Problems
Homework Solution
$98.00
Statistics
Mathematics
R Programming
Genes
Expressions
Functions
PCA
Scaled Variables
Proportion
Variance
Correlation
Observations
Data Sets
R Programming Problems: Durban Watson Test
Homework Solution
$33.00
Computer Science
R Programming
Durban Watson Test
Statistics
Mathematics
Commands
Codes
Variables
Regression
Plots
Residuals
Functions
Lines
Graphics
Advanced Biostatistics: Logistic Regression & GLMM
Homework Solution
$48.00
Biostatistics
Logistic Regression
Mathematics
Functions
R Programming
P-Values
Significance
Degrees of Freedom
Normal Distribution
Null Deviances
Residual Deviances
Intercepts
Chi-squared Test
Plots
Linear Regression Questions
Homework Solution
$48.00
Mathematics
Statistics
Linear Regression
Samples
P-Values
Probability
Injuries
Tables
Predictors
Variances
Null Hypothesis
Significance Level
Data Driven Management
Homework Solution
$38.00
Statistics
Business
Salaries
Datasets
Budget
Finance
Mathematics
Labels
Graphs
R-Programming
Lahman Package
Functions
Get help from a qualified tutor
Live Chats