## Transcribed Text

1. Use of wage data in STATA to study wage determination.
a. The data set cps04.dta contains 200 observations on randomly selected workers from the March 2004 Current Population Survey. This is a survey of over 50,000 households conducted monthly, and it serves as the basis for the national employment and unemployment statistics. Data are collected on a number of individual characteristics as well as employment status. This data extract contains information on four variables, in the following order:
1) years of education
2) 1 if person is female, 0 if male
3) years of labor market experience
4) natural logarithm of average hourly earnings
DownloadthisdatasetandloaditintoSTATA.b. Comparethesimple regressions:
earnings = π½ + π½ (years of education) +-
earnings = π½+ + π½7(years of experience)
Interpretthecoefficientsπ½ andπ½.(becarefulabouttheunitsyou -7
report).
Comment on the R-squared in each regression β that is, describe what it tells us in each case.
1
c. Now estimate the multiple regression:
earnings = π½ + π½ (years of education) + π½ (years of experience) +-7
Interpretthecoefficientsπ½ andπ½. -7
Is this regression βbetterβ than either or both of the previous simple regressions? Explain your reasoning.
d. Nowcreatetwonewvariables:thesquareofeducationandthe square of experience. (you can use the command: gen education_sq=education^2, for example) Estimate the multiple regression:
earnings = π½ + π½ (years of education) + π½ (years of experience) +-777
+π½(π¦ππππ πππππ’πππ‘πππ) +π½(yearsofexperience) ;I
Compare the estimated payoff in terms of higher hourly earnings from an additional year of education from this equation to the payoff from the equation in part (c). Do the same for experience. Is there evidence that this regression fits the data βbetterβ than the one in part (c)? Explain your reasoning.
e. Estimatethemultipleregression:
earnings = π½ + π½ (years of education) + π½ (years of experience) +-777
+π½(π¦ππππ πππππ’πππ‘πππ) +π½(yearsofexperience) ;I
+ π½ (years of education) + π½ (πΉπππππ) -J
How do we interpret the coefficient on female?
f. Can you think of an alternative specification to consider the effects of gender on hourly earnings? If so, then run the new regression and interpret its results.
g. Whatothervariablesmightyouwanttoincludeinanearnings equation besides these? Explain your reasoning.
2
2. Usethedatainkielmc.dtatoanswerthefollowingquestions.Thedataare for houses that sold during 1981 in North Andover, Massachusetts; 1981 was the year construction began on a local garbage incinerator.
a. To study the effects of the incinerator location on housing price, consider the simple regression model
log(πππππ)=π½ +π½ log(πππ π‘)+π’ +-
where price is housing price in dollars and dist is distance from the
house to the incinerator measured in feet. Interpreting this equation
causally,whatsigndoyouexpectforπ½ ifthepresenceofthe -
incinerator depresses housing prices? Estimate this equation and interpret the results.
b. Tothesimpleregressioninpart(a),addthevariableslog(intst), log(area), log(land), rooms, baths, and age, where intst is distance from the home to the interstate, area is square footage of the house, land is the lot size in square feet, rooms is total number of rooms, baths is number of bathrooms, and age is age of the house in years. Now, what do you conclude about the effects of the incinerator? Explain why (a) and (b) give conflicting results.
c. Add [log(intst)]2 to the model from part (b). Now what happens? What do you conclude about the importance of functional form?
d. Isthesquareoflog(dist)significantwhenyouaddittothemodel from part (c)? What does this imply about the effect of log(dist) on log(price)?
3. Usethedatavote1.dtaforthisexercise.Thefollowingmodelcanbeused to study whether campaign expenditures affect election outcomes:
π£ππ‘ππ΄ = π½ + π½ log(ππ₯πππππ΄) + π½ log(ππ₯πππππ΅) + π½ πππ‘π¦π π‘ππ΄ + π’ +-7;
where voteA is the percentage of the voted received by Candidate A, expendA and expendB are campaign expenditures by Candidates A and B, and prtystrA is a measure of party strength for Candidate A (the percentage of the most recent presidential vote that went to Aβs party).
3
a. What is the interpretation of π½ ? -
b. Estimatethegivenmodelusingthedataandreporttheresults(that is, interpret the coefficents, and explain what the adjusted R2 tells us).
c. Is there statistical evidence that Aβs expenditures affect the outcome? What about Bβs expenditures?
d. Look at the size of the estimated coefficients for π½ and π½ . What -7
does this suggest about the efficacy of candidate A increasing expenditures by 1% if, at the same time, candidate B also increases expenditures by 1%?
4. Usethedatadiscrim.dtatoanswerthisquestion.TheseareZIPcode-level data on prices for various items at fast-food restaurants, along with characteristics of the zip code population, in New Jersey and Pennsylvania. The idea is to see whether fast-food restaurants charge higher prices in areas with a larger concentration of blacks.
a. Consider a model to explain the price of soda, psoda, in terms of the proportion of the population that is black, prpblck, and median income, income:
ππ πππ=π½ +π½ πππππππ+π½ ππππππ+π’ +-7
Estimate this model by OLS and report the results in equation form, including the sample size and R-squared. Interpret the coefficient on prpblck. Do you think it is economically large? In other words, are small changes in the proportion of blacks in a ZIP code associated with large changes in price, or relatively small changes in price?
b. Comparetheestimatefrompart(a)withthesimpleregression estimate from psoda on prpblck. Is the discrimination effect larger or smaller when you control for income? Why? [it might help to think in terms of omitted variable bias]
c. A model with prices and income both logged might be more appropriate. Report estimates of the model:
4
log(ππ πππ) = π½ + π½ πππππππ + π½ log(ππππππ) + π’ +-7
Interpret the coefficients on prpblck and log(income), being careful about the units that each are measured in.
5. Usethedatabeauty.dtatoanswerthisquestion.
a. Separately for men and women, estimate the model:
log(π€πππ)=π½ +π½ πππππ£π+π½ πππ£ππ£π+π’ +-7
where belavg is a dummy variable equal to 1 if the person has below average looks and is equal to 0 otherwise, and abvavg is another dummy variable equal to 1 if the person has above average looks and is equal to 0 otherwise.
b. Ifwehaveincludedpeoplewithbelowaveragegoodlooksand people with above average good looks, what category of people have been omitted?
c. Report the results in equation form, and comment on the statistical significance of the coefficients on belavg and abvavg. Is there convincing evidence that women with above average looks earn more than women with average looks?
d. Now,forbothbelavgandabvavgcreateinteractiontermswiththe variable female. [in STATA you can type a command such as: gen belavg_f=belavg*female]. Estimate the model:
log(π€πππ) = π½ + π½ πππππ£π + π½ πππ£ππ£π + π½ ππππππ +-7;
+ π½ (πππππ£π β ππππππ) + π½ (πππ£ππ£π β ππππππ) + π’ IJ
Report the results in equation form. Explain in words what the coefficientsπ½ andπ½ aretellingyou.Isthereanystatisticalevidence
that the effect of being of below average looks is less bad for women than for men?
IJ
5

This material may consist of step-by-step explanations on how to solve a problem or examples of proper writing, including the use of citations, references, bibliographies, and formatting. This material is made available for the sole purpose of studying and learning - misuse is strictly forbidden.