Week 4 - Linear Regression with One Regressor
Overview
Welcome to Week 4! We are proceeding boldly into the world of linear regression. We’re starting by looking into the linear relationship between two variables.
What we won’t be doing is controlling for other factors, nor conducting statistical inference. We won’t be looking at non-linear relationships yet either! Bah.
Rather, we’re going to dive deep into what it means to look at how we can find a good estimate - nay, the best estimate! - of the relatonship between some \(X\) and some \(Y\).
This is where Stock and Watson really start to shine. I highly recommend basking in their expertise and conversational style.
Reading Guide
Chapter 4: Linear Regression with One Regressor
SW 4.1 - The Linear Regression Model
This section is packed w/ good intuition and bolded vocabulary. Make sure you know it!
SW 4.2 - Estimating the Coefficient of the Linear Regression Model?
You should be able to estimate linear regression coefficients by hand 😪.
SW 4.3 Measures of Fit
Know how to use and interpret \(R^2\), \(ESS\), \(TSS\), \(SSR\) and \(SER\). You will also need to know how to find these from raw Stata output as well.
SW 4.4 Least Squares Assumptions
Known and understand the three least squares assumptions
SW 4.5 Sampling Distribution of the OLS
Discuss unbiasedness of estimators and effects of larger vs. smaller sample sizes on standard errors. We won’t calculate standard errors by hand.
Note on causality
At this stage, we are thinking about making good model of data, but not necessarily the data generating process behind that data. When we use the framing of an independent and dependent variables, it’s tempting to think that we’re examining whether the independent variable causes the dependent variable.
At this point in the course, we’re looking at associations which could be causal … or they could not be!
If you want to dig deeper, check out this great guide from EGAP:10 things to know about causal inference.
Causality vs correlation explained... #EconTwitter pic.twitter.com/4ILPdgD9yD
— Daniele Bianchi (@WhitesPhD) September 21, 2020
Slides
Videos
Video: Derivation of \(\hat{\beta_0}\) and \(\hat{\beta_1}\)
Here, I work through how we derive estimates of \(\hat{\beta_0}\) and \(\hat{\beta_1}\) using our good friend, calculus.
Video: Building intuition around the OLS model
You can play along with the same simulator!
In-class exercise
Link to pdf here
Consider a dataset on births to women in the United States. Two
variables of interest are infant birth weight in ounces (bwght
), and
the average number of cigarettes the mother smoked per day during
pregnancy (cigs
). The following simple regression was estimated using
data on 1,388 births.
Source | SS df MS Number of obs = 1,388
-------------+---------------------------------- F(1, 1386) = 32.24
Model | 13060.4194 1 13060.4194 Prob > F = 0.0000
Residual | 561551.3 1,386 405.159668 R-squared = 0.0227
-------------+---------------------------------- Adj R-squared = 0.0220
Total | 574611.72 1,387 414.283864 Root MSE = 20.129
------------------------------------------------------------------------------
bwght | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
cigs | -.5137721 .0904909 -5.68 0.000 -.6912861 -.3362581
_cons | 119.7719 .5723407 209.27 0.000 118.6492 120.8946
------------------------------------------------------------------------------
These results can also be written in the following way:
\[\widehat{bwght} = 119.77 - 0.514 cigs\]
What is the dependent variable? What is the independent variable?
Write, in words, what the interpretation of \(0.514\) is.
What is the predicted birth weight among mothers who do not smoke? What about when \(cigs=20\) (one pack per day)? Comment on the difference.
Consider Prof. Beam, whose mother “cut back” to 10 cigarettes per day (it was the 80s) and was born weighing 9lb, 15 oz. What is her residual?
Find \(R^2\) in the raw regression output. What does it tell us?
Are any least squares assumptions likely to be violated? Explain.
Does this simple regression necessarily capture a causal relationship between the child’s birth weight and the mother’s smoking habits? Explain.