So I'm working through that engineering chapter again before submitting it and updating some analyses with new data that came out this year. I'm looking at a section where I'm discussing the graduate degree wage premium and I do something really weird in my code that I just can't figure out:
bysort year dgrdg: egen w=total(salarp*rweight)
bysort year dgrdg: egen totalr=total(rweight)
gen wrate=w/totalr
tab year dgrdg, sum(wrate) mean
tab year dgrdg [w=rweight] if salarp!=.
dgrdg is the degree variable, salarp is salary. For some reason I'm aggregating it up before tabulating. I get qualitatively similar results when I just do the natural thing of tabulating salaries and weighting the tabulation, but different results... does anyone have any clue what might have driven me to do this the first time (maybe a year ago)? The fact that the results are somewhat different (although like I said, qualitatively similar) makes me wonder if the weights are even frequency weights, which makes me want to stick with more intuitive thing I'm doing today.
This is why you comment code.
ReplyDeleteYa, ya... my code tends to have lots of good commenting at the beginning that trickles off at the end. If I were trained as a proper programmer, I'm sure I'd be better about that.
Deleteegen will treat missings as zero. but in your second tabulation you ignore missing observations. Does that solve it?
ReplyDeleteAlso you should stop using bysort, sort then by, then you're not resorting data every command, less chance of making an error if you need the _Nth observation after your first sort to stay the _Nth observation.