Week 6 - Multiple Linear Regression

Content for week of Monday, October 4, 2021–Friday, October 8, 2021

Overview
Reading Guide
Slides
Videos
- Video: Multiple Linear Regression
In-class exercises
- Exercises
Other resources

Overview

Let’s model!

Now, we can build powerful models with heaps of dependent variables. Want to predict wages? Let’s control for education, for experience, for gender, for age, for age squared (yes!). YES. Only our degrees of freedom can hold us back.

Reading Guide

Chapter 6: Linear Regression with Multiple Regressors

SW 6.1 Omitted Variable Bias

A discussion that connects nicely with our previous discussion of the zero conditional mean discussion and causal inference.

SW 6.2 The Multiple Regression Model

Hooray!

SW 6.3 The OLS Estimator in Multiple Regression

This section doesn’t get into derivation, and neither do we!

SW 6.4 Measures of Fit in Multiple Regression

The only new thing here is a revised $S E R$ forumla and the introduction of the Adjusted $R^{2}$ . Note that the lecture video also discusses the root mean standard error, $R M S E$ , which is a lot like the $S E R$ except that it uses $n$ rather than degrees of freedom as a denominator.

SW 6.5 The Least Squares Assumptions in Multiple Regression

Take the three from univariate regression and add … no multicollinearity. Sorted.

SW 6.6 Distribution of the OLS Estimators in Multiple Regression

Just the intuition, don’t worry about the appendix.

SW 6.7 Multicollinearity

Make sure you understand the examples, but remember that in practice, any statistical package will fix perfect multicollinearity on its own. Imperfect multicollinearity, on the other hand, is something to think about when crafting your models.

SW 6.8 Conclusion

Treat yourself.

Slides

Download this week's slides.

Videos

Video: Multiple Linear Regression

In-class exercises

Download PDF here, which contains regression output for questions (2) - (4)

Exercises

Consider a dataset on earnings in the United States. We are interested in the returns to education - how much an extra year of schooling “buys” you in terms of weekly wages (...as of 1980). You’re also worried about whether one’s education suffers from omitted variable bias.

You estimate two equations: $\begin{aligned} \hat{w a g e} & = 146.95 + 60.21 e d u c \\ \hat{e d u c} & = 5.84 + 0.075 I Q \end{aligned}$
Based on these results, is 60.21 an overestimate or underestimate of the returns to education? How do you know?
You estimate another equation: $\hat{w a g e} = - 128.89 + 42.06 e d u c + 5.14 I Q$
What is the interpretation of the coefficient on $e d u c$ ? What is the interpretation of the constant?
Now, you control for experience and age and estimate the following population regression model:
$w a g e_{i} = β_{0} + β_{1} e d u c_{i} + β_{2} I Q_{i} + β_{3} e x p e r_{i} + β_{4} a g e_{i} + β_{5} a g e_{i}^{2} + u_{i}$
A one-year increase in age is associated with what change in wages? (mind the squared term)

Finally, because you are worried about omitted variable bias, you include father’s and mother’s education.
1. Why might parent’s education might directly affect wages?
2. Which other independent variables do you think parent’s education might affect? Explain.
3. How did controlling for parent’s education affect the returns to education? The returns to IQ?

Other resources

As requested, slower graphs! Also added a graph on collider bias, the webpage explanation helps there.

These graphs are intended to show what standard causal inference methods actually do to data, and how they work.

This is what controlling for a binary variable looks like: pic.twitter.com/dTZxqY5JxA
— Nick HK (@nickchk) November 29, 2018

Last updated on September 29, 2021

Edit this page