Tuesday, May 27, 2014

Nature abhors a discontinuity... usually... a little more on Piketty

It's only "a little more" because I am wrapping up an R&R and some contract work and my last comprehensive exam before I can actually read the damn thing. I'm therefore still withholding any strong conclusions but I'm happy to express skepticism that the Giles analysis will hold any major problems.

One point caught my eye in a recent post by Bob Murphy about the Giles analysis. I know I can count on him to tell me if I am getting any of this wrong, but at least I can tell him I invite the criticism.

Bob suggests that the ONS data discrepancy is a red herring. I find that a little odd because it's what most of the critics seem to think is the major point, so if it is a red herring then it's one people are concerned about because it's what the critics are interested in. Regardless, Bob contends that there are bigger issues with more frequent data, such as the Inland Revenue Service (IRS) data for the top 10%'s share of wealth that is circled in black, below. Piketty's series is in blue.

Giles vs Piketty on UK

Bob writes:
"You can see the huge gulf between that raw data set, and Piketty’s blue line above it. You can see the Inland Revenue Series (IRS) data here; note that the figures for wealth held by the top 10% at the end of the series in the year 2005 is 54%, not the nearly 70% value through which Piketty’s trend line moves in that year[...] So it should be crystal clear to anyone who actually wants to see if Giles has a point–and went through his work carefully–that Giles’ case doesn’t rest or fall on our opinion of the ONS data. Rather, Piketty’s blue line in the shot above is well above the IRS data for the middle-2000s. The only raw data source Piketty can use to get such a high figure for wealth held by the 10% (at 70.5%) in the year 2010 is to rely on the “HMRC Top 10%” data, but the HMRC report itself proclaims that it is not suitable for such purposes (according to Giles in his FT critique, but I could not personally track down this claim and independently verify it)."
I think we need to be very careful about claims like this, and this is precisely why I'd rather wait for more details on what went into the data processing. In this case what leaps out at me is the discontinuity in the "LATEST IRS Top 10%" data and the data that come before and after it, particularly since the early data points of this series overlap with other data series in the same year that are much higher. Generally things don't move discontinuously like that in nature, so when you see that in the data there is very likely to be either an abrupt policy change or (more likely in this case) something different about how the data is collected. Maybe some things are left out of the new series that were in the old series. If they are just different series maybe the sampling frame is a little different or the variables are a little different. Maybe definitions differ across two surveys or within one survey over time.

A good example in the work I'm doing for the NAE is a very simple series on employment levels for engineers and engineering technicians and technologists. I use the CPS for this because it's best for long term labor market information. Every once in a while the federal government slightly changes occupational definitions, so tracking this over decades means that I am looking at slightly different populations. Some of this is just accounting for natural changes in jobs over time (as you can imagine, an engineering technician's job today is very different from what it was forty years ago), and certain changes are bigger than others. One big change to occupational categories occurred in 2002, largely to account for the rise of computer and information technology occupations (ironically, it makes studying computer and information technology labor markets during this period a real pain). Some engineers and engineering technicians got pushed into other categories (in a lot of cases IT) at this time and what you have in the data is relatively sharp break. That's what you see in my figure below.

Now I could have just continued the data series and made the break look like it was an actual drop but that would be misleading. I knew much of the change was not a drop in employment at all - it was a change in the way the data was collected. I didn't do that. Instead I disconnected the pre- and post-2002 series and used a dashed line in the post- section to make sure it was distinct, and I discussed the issue in my text.

I could have done what Piketty did - reconstruct my own series from the various data sources I had (I am working with many other data sources by the 2000s besides the CPS), and using what I know about the changes in the occupational categories. Indeed the BLS provides documentation for what adjustment factors to apply, and in this case of course it could mean a difference of a couple hundred thousand workers.
Figure 5. Employment of engineers and engineering technicians and technologists, 1971-2013

Source: Author’s calculation from the 1971-2013 March CPS

For this project it was completely unnecessary to do all that. I am just giving a taste of levels and trends in these fields (particularly technicians, which are less commonly studied) to a group of (mostly) non-economists and moving on to other analyses. If I was doing work where the trend itself was very important to my discussion I would have reconstructed it instead of just separating the series. If I had done that then my reconstructed series would have been well above the actual data reported in the CPS (or well below, if I traced the new definitions backwards), just like Piketty.

The thing is, it would have been a perfectly legitimate thing for me to do. And I have a suspicion Piketty's adjustments here were perfectly legitimate. They seem to jive with the results everyone else is getting and it strikes me as more probable that Giles, a journalist, is slightly confused about what kind of data processing went into the book than that Piketty, an economist, is making this stuff up.

But I just don't know. I really can't repeat enough that I really don't know what was done and why and I can't form a firm opinion - beyond general suspicions - until I do.

The income tax and minimum wage stuff is shockingly sloppy, but I don't see the nefariousness in it that some people do. People that want a global wealth tax are sufficiently to the left that they are likely not trying to score points by making George H.W. Bush look bad and Bill Clinton look good. Those two seem pretty close from an American's perspective, much less from a French leftist's perspective. I think it's more likely that ideological biases passively contributed to the sloppiness (sort of a "well that sounds right" sort of thing) than that he was trying score ideological points by manipulating it.

It's like Tom Woods and the old 1920-21 depression errors I wrote about a while ago with the timing of Wilson and Harding's fiscal policy and the various monetary policy decisions. I don't think Woods made the mistakes because he was trying to boost Harding's reputation. I think he was being sloppy, took a look at it, figured it sounded right - it fit his narrative - and called it a day.


  1. Another good example - I have four data sources (ACS, CPS, OES, and NSCG) for the engineering technician and technologist population in 2010.

    The ACS, CPS, and OES are all quite close which is incredible to me because it's a vaguely defined workforce and all three of those surveys are collected in very different ways. The NSCG is WAY off. So far off I'm talking with the National Science Foundation (which does the survey) for possible things I may be doing wrong. I have them all in a table, not in a chart in this case, and the NSCG is just sitting out there. I'm going to feel no compunction about telling them that I think the actual figure is closer to the CPS/ACS/OES figure. Data is messy, and finding one source that tells a different story without really understanding why it's so different does not in itself constitute any evidence that there's a problem.

  2. Daniel, What do you think of the Magness criticism of Piketty’s U.S. data that I highlighted here:

    1. I have to admit I've been a little confused by some of Magness's concerns, including this one. Let me say thought that I'd prefer to use annual estimates unless they're really noisy (I don't know if these are), and if they are use a moving average. I imagine the reason he didn't (and this is fine if the data are really noisy - perhaps less advisable if they're not that noisy) is that you have these gaps and a moving average doesn't make as much sense with the gaps. I suppose you could do a bigger moving average, but that might oversmooth.

      Anyway what I really don't get about Magness's post here is that taking the ten year averages actually moderates the dip. The dip would have been deeper if he used the annual estimates. But instead of dipping down to a little above 19%, it's much flatter at a little above 21% from the 1970s to the 1990s! So far from Magness's claim that he is somehow overstating the point he seems to be presenting the data in a conservative way (at least when it comes to this period in the 70s).

  3. This is fine Daniel but just to clarify the ONS thing:

    (1) Giles says, "I was looking at the ONS and it was way different from Piketty. So I started digging. Now look at what I found! If you don't want to use ONS, I still think it's pretty big discrepancy using Piketty's own sources."

    (2) Critics say, "Worst case scenario, even if we take Giles' own red lines as truth, not much difference."

    (3) I say, "What?! There is a humongous gap between Giles' worst-case ONS scenario and Piketty."

    (4) Piketty and other guy in Bloomberg response say, "ONS is a survey, that's bad source. Can't believe Giles used that as gospel."

    (5) I say, "He didn't use it as gospel, he gave options. Focusing on ONS is a red herring."

    1. Except Giles' first line (or the FT's, if you prefer) was "Piketty has made stuff up and/or made serious errors". He also referred to the ONS data as "the best data", which is obviously debatable. If Giles had just written a blog post about "discrepancies", I think Piketty & his defenders would be a bit less annoyed with him for jumping the gun, but as it stands he essentially implied Piketty had no answers to his (reasonable) questions, then went on to assert his data were superior.

  4. Daniel,

    I admit there are some things that Magness harps on that don't seem like a big deal to me, but his point about the averaging of the 1970s etc. is this: Piketty's chart makes it look as if inequality was falling through 1970, when it turned around and has been on a steady rise since the 1980s. Thus, it looks as if it's been increasing for several decades--coinciding with the Reagan/Thatcher revolutions in neo-liberal policy reform and globalization--and so we should expect it to keep going up.

    In contrast, one of Piketty's sources shows that it has been flat since the 1980s.

    1. The uptick in the source data is bigger precisely because it doesn't average by decade. I am not seeing it. Note that the source data ends in the 90s so it doesn't have the further uptick in the 2000s.

      I simply don't see how going from 21.2 to 21.4 (Piketty) is supposed to be a sharper change than 19.1 to 21.1 (the source data).

  5. It's "Inland Revenue" not "Inland Revenue Service", if they're abbreviated it's IR. They're now called "Her Majesty's Revenue & Customs".

  6. Daniel,

    "The income tax and minimum wage stuff is shockingly sloppy"

    The couple of slight errors regarding top income taxes that Piketty makes in the text, are not replicated in his charts or tables. They contain the correct data. Which is a bit odd perhaps.

    See top income tax rates here for example:


    I haven't looked up the minimum wage stuff.


All anonymous comments will be deleted. Consistent pseudonyms are fine.