That last line isn't entirely fair. There are some good IV studies out there, but IMO they are the ones that use a bazillion different robustness checks to assure you they're not full of shit.
So take that as poetic license and shift attention to the first part of the title.
I'm doing some work right now for a class paper (and hopefully a publication some time in the future) on within-study tests of propensity score matching. Similar papers are out there for other methods but I'm working with PSM. As anyone that's worked with the method before knows, it's decent enough but very dicey. You have to really know what kinds of selection mechanisms are in play to make it useful because it's simply not a solution to the problem we worry about most: selection on unobservables. It ONLY works for selection on observables that we're not good at modeling. In a panel framework you can make the case that selection on unobservables is accounted for by using past data (i.e. - I don't see my unobservables but I match on wages at t-1 and since I'm interested in wages at t+1, all the unobservables I think affect wages at t+1 are also affecting wages at t-1 in the same way, so by matching on t-1 I'm OK).
We've learned much of this about PSM from random assignment studies that use PSM to test a result with real world data (lots of this can be shown with simulation, but simulation isn't necessarily relevant to outcomes in the real world).
This is true of most quasi-experimental methods. There's usually a literature on within-study tests of the method comparing to random assignment. This is a great way to learn about the limits of our quasi-experimental options, and that's always good to know especially when we only have quasi-experimental options).
As far as I know, though, there's no such within-study test of IV estimators. There's lots of within-study comparisons to OLS but that's a different exercise entirely. If they're both biased then who cares if it's close to OLS or not?
I'm not entirely sure why this is, except that IVs aren't generally used for evaluations where you might have random assignment. Instead they're used for more fundamental scientific questions. But at least in the returns to education literature you'd think this could be possible.
One more reason to have a healthy skepticism of IVs.