Sunday, November 3, 2013

Instrumental variables get a pass on rigor because economists essentially use them for personal entertainment...

That last line isn't entirely fair. There are some good IV studies out there, but IMO they are the ones that use a bazillion different robustness checks to assure you they're not full of shit.

So take that as poetic license and shift attention to the first part of the title.

I'm doing some work right now for a class paper (and hopefully a publication some time in the future) on within-study tests of propensity score matching. Similar papers are out there for other methods but I'm working with PSM. As anyone that's worked with the method before knows, it's decent enough but very dicey. You have to really know what kinds of selection mechanisms are in play to make it useful because it's simply not a solution to the problem we worry about most: selection on unobservables. It ONLY works for selection on observables that we're not good at modeling. In a panel framework you can make the case that selection on unobservables is accounted for by using past data (i.e. - I don't see my unobservables but I match on wages at t-1 and since I'm interested in wages at t+1, all the unobservables I think affect wages at t+1 are also affecting wages at t-1 in the same way, so by matching on t-1 I'm OK).

We've learned much of this about PSM from random assignment studies that use PSM to test a result with real world data (lots of this can be shown with simulation, but simulation isn't necessarily relevant to outcomes in the real world).

This is true of most quasi-experimental methods. There's usually a literature on within-study tests of the method comparing to random assignment. This is a great way to learn about the limits of our quasi-experimental options, and that's always good to know especially when we only have quasi-experimental options).

As far as I know, though, there's no such within-study test of IV estimators. There's lots of within-study comparisons to OLS but that's a different exercise entirely. If they're both biased then who cares if it's close to OLS or not?

I'm not entirely sure why this is, except that IVs aren't generally used for evaluations where you might have random assignment. Instead they're used for more fundamental scientific questions. But at least in the returns to education literature you'd think this could be possible.

One more reason to have a healthy skepticism of IVs.


  1. My stance is generally to have a healthy skepticism of econometrics writ large, as well as a health skepticism for the specific failings of the specific methods; on the other hand, to accept that true human knowledge, especially knowledge based on something other than direct perception and anecdota, is very hard to come by, and so to take what we can get.

    In the case of almost all these kinds of econometric tools - PSM, RD, IV - the thing to remember is that you're looking at the LATE, not the ATE.

    So here's a classic example - does a job training program increase employment? Well, it's hard to say because of selection bias! But - if we randomly divide a hopefully-random sample of the unemployed people in a certain city into treatment and control groups and then send somebody to the doors of the treatment group to pitch them on the job training program, then we can take a look at the difference in proportions of the two groups that show up and then the difference in outcomes and use basic IV to say "aha, we can thereby estimate the impact of the program." BUT! You're only estimating the income of the program on the SUBSET of the population that wouldn't come to the program if not nudged but would come if nudged hard enough. That doesn't tell us anything about the impact of the program on the kind of people who would definitely come or the kind of people who would be dragged kicking and screaming. So you're only estimating the LATE, not the ATE. For many, even most people, job training is going to have a minimal impact, because they're either unreceptive or challenged in ways job training can't fix, or they're highly motivated or quick studies and thus probably would have found jobs soon enough anyway. So you can't really judge the program as a whole from its effect on the nudged.

    A famous example:

    Even if you think the methods in that paper are sound, you can't then conclusively generalize their estimates to women with 0 children, 1 child, 3 children, 4 children, etc.

    1. Squarely Rooted: It sounds like you would enjoy reading Probability, Econometrics, and Truth by Hugo A. Keuzenkamp and Statistical Models and Causal Inference, an anthology of pieces by the late statistician David A. Freedman that was finished by his colleagues. (And I won't be surprised if you have heard of either of them before, or have read either scholar's work before.)


All anonymous comments will be deleted. Consistent pseudonyms are fine.