I've been a little concerned that my last couple posts have been confusing for some people based on some comments from Ryan Long about all the variables DLR are putting in, so I want to zoom out to the big picture a little. First Ryan expressed concern that we're adding too many variables in a fixed effects model and that that is losing us significance*. Recently he expressed concern that we were just adding variables to reduce bias on the idea that adding more variables reduces bias.
This one concerns me a lot more and now I'm worried more people have missed the whole point of these posts. We are not just chucking things in the model and waiting to lose the significance. We have a treatment effect we're trying to estimate but we have non-experimental data so we need to figure out a way to mimic an experiment and get at least a good sense of what the treatment effect is. DLR have chosen to do that with what is at its core a DID set-up.
But once you do that, the comparison group you have can still run into certain problems that can bias the result. We've worked a lot with these models, though, so we know ways around those problems and usually that involves adding other variables. We're not just adding them for the hell of it - we're adding them because when you add a variable it changes which bit of variance in the data you are using to estimate the effect.
That bolded sentence is the key here.
And that's been the point of my last several posts. Bob Murphy raised concerns (not Ryan's concerns - I think Bob understands the big picture about non-experimental estimation I'm laying out here) about certain variables that were added. My view is that all of these were essential to get unbiased results and represent an improvement on earlier estimates.
So that has been the point. I've been trying to explain why changing the model in X way gets you a better estimate than refraining from changing it in X way. It's not just a matter of adding any ol' variable.
* Don't worry - it's not the case - there's a tremendous amount of degrees of freedom so there is no concern about that. In fact DLR's models should have (I'd have to double-check) many orders of magnitude more degrees of freedom than Neumark and Wascher's, which was a state study. Moreover, only the significance of the minimum wage variable in the employment model dropped, not in the others. If it were a df problem they'd all be mush - there'd be no reason for one model to be unaffected and one to lose significance if that were the problem. Finally, Ryan can easily look at the standard errors - they haven't exploded or anything like that. It's just a regular old insignificant effect - no funny business. That would not have gotten past the editors and the referees of RESTAT.
How to keep your init files on GitHub
1 hour ago