Consider the following regression model:
cigs = β₁ + β₂cigpric + β₃educ + β₄age + ɛ
where ’cigs’ is cigarettes smoked per day, ’cigpric’ is cigarette price per pack.

a) Use ’SMOKE.csv’ dataset from Wooldridge, estimate the parameters in the model by OLS.
b) Suppose you suspect that ’cigprice’ may be endogeneous and worry that your estimates in a) could be inconsistent, can you ﬁnd an IV or 2SLS estimates instead?
c) Conduct the Hausman and Wu test and explain which set of estimates (LS or IV) would you prefer?

from numpy import column_stack, dot, eye, diagonal, linalg, matrix, mean, ones, shape
from numpy import sqrt, zeros
# For the critical values of f and chi-squared distributions
from scipy.stats import chi2

# Read data from a CSV file with a header for each variable

# Organize the data

y = f.cigs
n = y.shape[0]

cigprice = f.cigprice
educ = f.educ
age = f.age

X_ = column_stack([cigprice, educ, age])
one = ones(n)
X = column_stack([one, X_])
k = X.shape[1]...

