Friday, February 8, 2013

Epic Megan McArdle fail

In case it goes somewhere, I'm just copying the post in below too. I am getting ready to go to a seminar, so I'll let you figure out what's wrong with it all (hint - compare the increments for what she calls "awful" - the first figure, and what she offers as an alternative).

(HT Don Boudreaux, who seems overly eager to jump on evidence against the idea that inequality might be a problem - see my comments on his post if you give up).

And whoever said income was a bell curve?!?!? It's a freaking Pareto distribution. Didn't we nail that one down like a century ago???

*****

"When I was growing up in Canada," says Jon Evans of Techcrunch, "I was taught that income distribution should and did look like a bell curve, with the middle class being the bulge in the middle. Oh, how naïve my teachers were. This is how income distribution looks in America today"

  File:Distribution of Annual Household Income in the United States.png

"That big bulge up above? It’s moving up and to the left. America is well on the way towards having a small, highly skilled and/or highly fortunate elite, with lucrative jobs; a vast underclass with casual, occasional, minimum-wage service work, if they’re lucky; and very little in between."

This seems to be the fear of the week. Something--outsourcing, robots, immigration, or maybe just the greedy rich hoarding all the good jobs in their massive bank vaults--is stealing all the prosperity from the bottom of the income distribution. Pretty soon, those people will be reduced to begging in the streets while the rich stride past them with their robot servants in tow.

And it's certainly a striking graph. So striking that I was initially very surprised. Income is one of those things that follows a pretty predictable distribution: big bulge at the low end, trailing off to a few households at the top. Has American income inequaltiy gotten so bad that it's actually broken a statistical regularity?

Er, no. Look closely at those last two brackets. Now look at the brackets immediately to the right of them? What do you notice?

Probably, you notice the same thing that immediately struck me: the last two brackets cover a much, much wider income band than the rest of the brackets on the graph.

Each bar on that graph represents a $5,000 income band: Under $5,000, $5000 to $9,999, and so forth. Except for the last two. The penultimate band is $200,000 to $250,000, which is ten times as wide as the previous band. And the last bar represents all incomes over $250,000--a group that runs from some law associate who pulled down $251,000 last year, through A-Rod's $27 million annual salary, all the way to some Silicon Valley superstar who just cashed out the company for a one time windfall of hundreds of millions of dollars. Unsurprisingly, much wider bands have more people in them than they would if you kept on extrapolating out in $5,000 increments.

In 2011, the $195,000 to $199,999 segment had 312,000 households in it. If you multiply by ten, you get 3 million households. But the $200,000 to $250,000 segment had just 2.2 million households. In other words, as incomes rise, the number of households-per-$5,000-of-income falls--just as Jon Evans learned it should, in those long ago Canadian days.

To put it another way, the apparent clustering of income along the rich right tail of the distribution is just an artifact of the way that the Census presents the data. If they kept running through $5,000 brackets all the way out to A-Rod, the spreadsheet would be about a mile long, and there would only be a handful of people in each bracket. So at the high end, where there are few households, they summarize.

Meanwhile, there's no evidence that the middle class is getting poorer. Here's the Census's Households by Total Money Income table, summarized for 1967 through 2011, which shows the percent of households in each bracket. (Figures are in inflation-adjusted dollars) As you can see, there's no big movement "up and to the left", other than a slight expected income downshift during the Great Recession.

file
Source: Census Bureau Table H-17

However, Jon Evans is right about one thing in the chart he linked: income does not, quite, follow the skewed bell curve that you'd expect. Look at the bars around nice, round income numbers like $50,000 or $100,000. Suddenly, instead of falling slightly as you'd expect, they climb back above their leftward neighbors. Someone is rounding off to 5's and 10's--either the bosses who set salaries, or the folks who report their incomes. Probably both. So there is an interesting statistical anomaly here. But it's not a result of rising American income equality. It's a product of an entirely different historical accident: the fact that human hands have five fingers.

17 comments:

  1. Mr. Kuehn, "Department of Awful Statistics" is a long running feature on my blog, in which I point out statistical mangling.

    In this case, the mangling was the statement by Jon Evans of Techcrunch that the income distribution, normally a normal curve, was bunching up to the left while the rich get richer. As you yourself point out, income is not "supposed" to be a normal distribution.

    The reason that both his and my graph bunch as they do is that this is how the census presents the data. I don't think there is anything wrong with bunching the top of the distribution to the right--in fact, I think it's necessary, since the alternative is to have a bunch of barely visible bars trailing out about a mile to the right. The problem is with drawing conclusions from the bunching. Or thinking that this somehow represents a huge departure from historical precedent.

    I thought I laid all this out pretty clearly in the post. I'm sorry that I failed to do so clearly enough for you to understand what I wrote.

    ReplyDelete
    Replies
    1. Are you Megan or Jane?

      Look, for one thing the author of the piece mentioned nothing about those last two bars. I assume he was well aware of your point because it is labeled pretty clearly on the graphic itself.

      You, on the other hand, seemed completely unaware of the bunching in the graph you provided as a counter-argument.

      You would not have written this: "Meanwhile, there's no evidence that the middle class is getting poorer. Here's the Census's Households by Total Money Income table, summarized for 1967 through 2011, which shows the percent of households in each bracket. (Figures are in inflation-adjusted dollars) As you can see, there's no big movement "up and to the left", other than a slight expected income downshift during the Great Recession.

      However, Jon Evans is right about one thing in the chart he linked: income does not, quite, follow the skewed bell curve that you'd expect. Look at the bars around nice, round income numbers like $50,000 or $100,000. Suddenly, instead of falling slightly as you'd expect, they climb back above their leftward neighbors. Someone is rounding off to 5's and 10's--either the bosses who set salaries, or the folks who report their incomes. Probably both. So there is an interesting statistical anomaly here. But it's not a result of rising American income equality. It's a product of an entirely different historical accident: the fact that human hands have five fingers."


      If you were aware of the bunching in your own graph. The only reason why you see the "climb back up above their leftward neighbors" is that the income brackets are wacky.

      If you knew this all along, you ought to retract those last two paragraphs.

      Delete
  2. I am Megan--this is my old blogger account, which I still use to comment.

    Mr. Kuehn, even if the brackets are bunched at the right, that wouldn't cover up the "upward and leftward" movement which is the main subject of the chart I created. Unless we're talking about an "upward and leftward movement" occurring entirely within the top two brackets, in which case who cares?

    The climb back up above their leftward neighbors is not an artifact of the fact that the brackets are wacky (except among the last two). It's presumably an artifact of the fact that people tend to round to numbers like $50,000 or $100,000, a phenomenon that I'm sure you're familiar with. The brackets that I was referring to were the ones on the left hand side--the ones that are all the same size ($5,000 of income).

    I'm not sure exactly what your complaint is. Is it that the chart I created from census data is "misleading" in the way that I said Jon Evans' graph was? But I didn't say Evans' graph was somehow itself substantively misleading; I said that you couldn't infer the hollowing out of the middle class from it, as Jon Evans did. Other people may have slapped labels like "how to lie with statistics" on it, but I didn't say he was lying; I said that Evans was inferring too much.

    Is it that I was "unaware" of the bunching? But your evidence is that I didn't mention it in an argument where I don't see that it makes any difference.

    Is it that I bunched the income? In that case, your complaint is with the census bureau, not me (their data was also the source of the WIkipedia graph that Evans is looking at).

    If you could clarify, perhaps we'd have a more productive conversation. I suspect that you reacted to other peoples' glosses on my post as "Jon Evans is being misleading" and then were incensed when you thought you saw me being misleading in exactly the same way. But the actual message of my post was "Jon Evans is confused" and I'm afraid that I didn't make the same mistake he did--that of thinking that a static graph of one year's income inequality was evidence that the middle class is being hollowed out as all the jobs disappear and the income flows to capital.

    ReplyDelete
    Replies
    1. I am happy to do a post going into more detail in a little while.

      Could you let me know why you think he is confused and thinks that a static graph is evidence of a middle class being hollowed out? I don't see anywhere in his post where he cites that as evidence.

      My concern, in a nutshell, is that you complained about his presentation of a graph with bunching, asserting he drew misleading conclusions from it (it's not clear to me at all that he did - he seemed aware that those last two bars were bunched), and then you went ahead and presented an even more bunched graphic without letting your readers know that explicitly and drawing conclusions like "instead of falling slightly as you'd expect, they climb back above their leftward neighbors" that depend on that misunderstanding.

      The brackets seem to me to be far too wide to detect a shift in inequality. The fact that most other indicators point in the direction of inequality makes me think his data was much better and much better interpreted (of course we've got growing inequality for lots of reasons - some benign, some not).

      Delete
    2. re: "The climb back up above their leftward neighbors is not an artifact of the fact that the brackets are wacky (except among the last two). It's presumably an artifact of the fact that people tend to round to numbers like $50,000 or $100,000, a phenomenon that I'm sure you're familiar with."

      No, it's EXACTLY because you're presenting bunched data. Get Jon's data, aggregate them into the bunched intervals you present, and you'll see the same effect.

      If you double the number of people in a bracket relative to another bracket, low and behold the bar is going to be twice as high and you're going to see a peak closer to the middle.

      Delete
    3. Jon doesn't have data. Jon took a graph off of wikipedia. That's the graph we're talking about. (I did get the underlying data, and the graph represents it accurately).

      The columns I'm talking about are the ones at $50,000; $60,000; $70,0000; $100,000; and $150,000. Knowing what I do about how the data is collected (household survey), I'm betting that even if we graphed a line, we would see discontinuities in those places.

      These are all bands of $5,000 worth of income, with the exception of the last two. I'm not sure why you're arguing that the increase in those lines--and not, say $120,000 or $30,000--is due to the size of the income brackets. They're all $5,000 bands, the same as the rest of the brackets, which mostly step downward in more-or less the way we'd expect. (It may help if you look at the original graph, which you can get by clicking on the graph in my post. Or by going here: http://en.wikipedia.org/wiki/File:Distribution_of_Annual_Household_Income_in_the_United_States.png

      Surely you must have come across this in your studies; it seems like every time I talk to people who work with these sorts of surveys, they chew my ear off about rounding in self-reported data.

      Delete
    4. Jane -
      No one has disputed your point that people are more likely to report 50,000 as opposed to 49,500. I'm not sure why you keep coming back to that point.

      You want to know why your graph peaks in the middle - it's because the middle bar is for a bracket $25,000 wide and the bracket to the immediate left of that is $15,000 wide and the bracket to the left of that is $10,000 wide.

      And with brackets many multiples as wide as Jon's brackets, of course you're not going to pick up the shifts in income distribution that have been so carefully documented.

      Delete
    5. As for your other question, I think that Jon Evans "is confused and thinks that a static graph is evidence of a middle class being hollowed out" and that he is citing this graph as evidence for this proposition because he featured the graph at the beginning of his post on how robots are stealing all the middle class jobs, and then wrote:

      "When I was growing up in Canada, I was taught that income distribution should and did look like a bell curve, with the middle class being the bulge in the middle. Oh, how naïve my teachers were. This is how income distribution looks in America today . . .

      That big bulge up above? It’s moving up and to the left. America is well on the way towards having a small, highly skilled and/or highly fortunate elite, with lucrative jobs; a vast underclass with casual, occasional, minimum-wage service work, if they’re lucky; and very little in between.

      But it won’t be 19th century capitalism redux, there’ll be no place for neo-Marxism. That underclass won’t control the means of production. They’ll simply be irrelevant."

      The graph doesn't show anything of the sort. Now, maybe you think he's not taking the graph as evidence. But he doesn't offer any other evidence for that proposition, and if this graph is not supposed to be evidence, then what is it doing there at all?

      If Evans was using it simply as an illustration, it's incredibly misleading. That graph does not show the distortion of the normal bell curve that income "should" have; it shows a fairly standard income distribution, with some compression of the brackets at one end. As I think you yourself are saying. But at points it seems as if you're attributing Evans' erroneous belief to me, though it's not clear and I may be mistaken about that.

      Delete
    6. I know what he wrote, Megan. Why do you think it's more likely that the graph was used to support "It’s moving up and to the left" instead of what I would have thought was the more obvious "Oh, how naïve my teachers were."

      That's my question. I know what he wrote.

      I would have thought it was obvious that "It’s moving up and to the left" is supported by the reams of research suggesting that that have been big news for years now.

      Certainly we can't just declare that every sentence in an article is derived from the one graph in the article!

      So do you have a reason to think it is? I don't see one. I think you're being a bit of a bully to Jon and making some big mistakes yourself.

      Delete
    7. Ahh, your 4:57 pm post makes clear the source of our misunderstanding.

      The bit about climbing back up above their leftward neighbors does not refer to the graph *I* made; it refers to the graph that *Jon Evans was looking at*. Here's what I wrote, to review, with the relevant qualifier set off with asterixes:

      "Jon Evans is right about one thing *in the chart he linked*: income does not, quite, follow the skewed bell curve that you'd expect. Look at the bars around nice, round income numbers like $50,000 or $100,000. Suddenly, instead of falling slightly as you'd expect, they climb back above their leftward neighbors. Someone is rounding off to 5's and 10's--either the bosses who set salaries, or the folks who report their incomes. Probably both. So there is an interesting statistical anomaly here. But it's not a result of rising American income equality. It's a product of an entirely different historical accident: the fact that human hands have five fingers."

      I'm aware that in the graph *I* made, the $50,000 bracket climbs back up above its leftward neighbor because it's bunched (and also because the brackets are generally wider and represent something slightly different from the data in the Wikipedia graph.) But that has nothing to do with the passage above. And the differing width of the income brackets doesn't matter for the purpose that I used those graphs for--which is to point out that *over time*, we do not see the substantial upward and leftward movement that Jon Evans seems to believe is happening, and evidenced in the chart of 2010 income inequality.

      The point about the clusters around $50,000 and so forth in his graph was sort of a throwaway--not a criticism of him, just the sort of oddity that I like to point out to my readers. At any rate, you missed the modifying clause and read me incorrectly, which explains why we have been so confused about each others' responses.

      Delete
    8. You also may have been confused by my use fo the term "skewed bell curve", which is an accurate visual description but of course could be taken to imply that I agree with his belief that income should somehow follow a normal distribution.

      Delete
    9. The graphs are so misleading as to be useless. I calculated the Pearson coefficient of the size of the bar vs the width of the bracket. A full 50% of the variation in the height of the bar is explained by the bunching of the data. I don't know why the census bureau would publish that kind of stuff. This is really a GIGO sort of situation...

      Delete
    10. And... I just realized I've been taken in by multi-correlation. These are the brackets: 15, 10, 10, 15, 25, 25, 50, 50. Notice anything? Yeap, they are negatively correlated with the income distribution.

      Delete
  3. Gee, I never expected a non-skewed income distribution. I don't know why anyone would expect one. The first graph looks familiar.

    And why aren't we looking at the log of income? The second set of graphs is closer to that, with unequal ranges. Something that I don't see mentioned in the article as quoted here.

    ReplyDelete
    Replies
    1. Right, log income is of course going to have a different distribution (and nice statistical properties too). But we live and trade in a world of levels :)

      Delete
  4. One should realize McArdle's career is based on being a knave and injecting foolishness into public discourse. She is an awful person. -- Robert

    ReplyDelete
  5. "It's a freaking Pareto distribution"

    No it isn't. It's a lognormal distribution, a gamma distribution or a Nakagami-M distribution, maybe even a Rayleigh distribution. But, not a Pareto distribution, though it may follow that model for a part of the curve.

    ReplyDelete

All anonymous comments will be deleted. Consistent pseudonyms are fine.