There's also disagreement on what the skeptics are claiming, and of course you run across stronger and weaker claims. I was responding to an especially strong claim (Perry and Biggs). Some have pointed out to me a weaker (and I think more defensible) skeptical claim from Steve Horwitz. I think both have important problems, but it's worth separating the two to explore those problems. So let's flesh this out a little:
Strong claim: The conditional difference in means is the amount of discrimination between men and women in pay and therefore there is little discrimination because women choose different occupations, majors, etc.Both claims have been associated with the idea that the pay gap is a "myth" (Steve uses precisely that language in his video) so in that sense they are making the same over-arching claim, just interpreting it a little differently.
Weak claim: The conditional difference in means is the amount of discrimination between men and women in pay and therefore there is little discrimination but there might be other social problems we don't like driving the differences in occupations, majors, etc.
The strong claim was addressed in the last post, and not many people seem willing to defend that claim, instead defending the weak claim and insisting I've misunderstood something. Well I didn't misunderstand Steve - I wrote up and sent in the response to Perry and Biggs even before Steve's post went up and of course it wasn't even a reaction to him.
So what are the issues with Steve's post? One is analytic and one is more of a framing concern I have.
1. The analytic problem
I think what I haven't highlighted as much on here (although I have on Facebook) is the issue of the joint determination of wages and every other decision men and women make associated with the labor market. You can't simply control for occupation and major and call it a day because people select into occupations and majors based on expected wages, and that selection process influences the observed wage distribution. If any of you are familiar with it, this is the basic point of the Roy model, and it has a variety of applications in labor economics. It is also analogous to the Lucas Critique and the need for some understanding and identification of the structural model in macroeconomics. A stripped down set-up to this joint determination problem is:
w0 = b0 + b1X + e | F = 0
w1 = a0 + a1X + u | F = 1
F = 1 if z0X + z1(w1hat-w0hat) + v > 0
The first two equations are wage equations. w0 is the wage in a male-dominated occupation and w1 is the wage in a female-dominated occupation. F is a dummy variable for employment in a female-dominated occupation, so the third equation is an occupational choice model. X are a set of characteristics of the worker and each equation has an error term. The important thing to notice here is that occupational choice is a function of the relative gains of entering a given field for an individual.
The short point here that anyone who's gone through an econometrics class should get right away is that occupational choice is endogenous in the wage regression. Controlling for it also controls away part of the wage differential. Think about how this plays out. Let's say there is a big gender differential in w0, the wage in male-dominated occupations. Men can expect to earn a lot more in these occupations than similarly qualified women. What would you expect to see, given equation 3? First, you'd expect to see only the women best suited to those occupations entering those occupations because they are the only women for whom w0 > w1. The corollary here is that the less qualified women for that sort of job will not enter that job. And the opposite is true of men. Men have an advantage in those fields so less qualified men will enter it because their w0 > w1 calculation is rigged. We can control for observed differences, but of course there are a lot of unobserved (to the econometrician - many of these are likely observed by the employer) differences and talents that will make a difference in pushing high ability women into male jobs and attract low ability men into those jobs. What you'd get out of a regression, though, is women that seem to be doing really well (because they're high ability) and a weakening or even elimination of the underlying wage discrimination.
This is not some crazy leftist invention to focus on discrimination, by the way. It's just Ricardo.
What should we do? We should:
1. Model selection explicitly, or
2. Take the naïve regressions as a first-cut sense of what forces matter in driving the disparity.
One thing you definitely, definitely don't want to do is decide that discrimination doesn't matter because the unexplained variation in a wage regression is small. Selection models are harder to explain to the public and it's more of an undertaking than an OLS, so I don't think that needs to be done every single time. We needn't throw out the baby with the bathwater here. But work along these lines should be done, and I imagine it is, to inform us about how at the very least occupational choice and wages are jointly determined.
You can expand the above to the joint determination of anything else that people tend to throw into kitchen sink wage regressions, but occupations and occupational segregation is a big one.
2. The framing problem
So the other problem I have with Steve's post is not so much that anything is wrong per se, but that I don't like how he's framed it. As I understand it Steve uses "discrimination" to refer to discrimination in the salary determination and "sexism" to refer to everything else. With these definitions in hand he starts off his video by telling people (like Perry and Biggs did) that the pay gap is an economic "myth", and that "it's 'mostly' not discrimination". I just think this muddies the waters. It focuses on a very narrow claim about pay determination and then uses it to make what sounds to most people like a very broad claim. It's an improvement on Perry and Biggs for sure in that it highlights other problems that they attribute to choice and preference. But it still follows the unexplained variation equals discrimination and explained variation equals something else (maybe sexism, but not employer discrimination) formula and I think both are wrong.
Discrimination is both potentially bigger and potentially smaller than this sort of formula suggests. It's potentially bigger for the reasons that I stated above - all of this is jointly determined so wage discrimination is absorbed into the other variables on the right hand side. It may also be smaller than the unexplained variation. There is a natural variability in talents, after all. There are natural variabilities in preferences. We don't know from the regression how much of that residual is discrimination and how much isn't. The only way to really get at it is what's called an "audit study", where you send out two otherwise identical resumes and see what happens. Now there are criticisms of these sorts of studies too (see several articles in the Spring 1998 JEP), but it's probably the best we've got. This is more of a treatment effect approach rather than the structural approach I described above.
So how do I like to talk about this if I don't like Steve's approach?
I generally don't talk about "discrimination" much. You'll see in the work I've done on this for the Urban Institute (usually with respect to race rather than gender), I usually use the word "disparity". This is very common, and people do it for the reasons I've raised here. Discrimination is a very strong claim and it's hard to pin down exactly what and where it is. People also tend to think only in terms of overt discrimination, and if you understand that structural discrimination is important you want to shy away from pointing to a coefficient or residual and calling it "discrimination", because many of the natural disparities that emerge are of concern too and you don't want to rule that out.