Yep, added it to the first slide with the picture above. Went over very well :)
Most of the presentations have been on empirical papers which are more interesting because they have a story and neat identification tricks. So it was nice to have this at the beginning of mine, since mine was one of only two presentations that was more about textbook type material.
That's exactly what my computer screen looked like when I tried to write an agent-based simulation of the indulgence market. Also pretty much every single ones of my forays into simulations, machine learning and my day to day job: compiling, running tests, running the flaky tests again...
I have a good link for your slides: http://oneweirdkerneltrick.com/
Looks like you have quadratic convergence for the first two iterations and then it falls off pretty quickly. If any part of the code is written in single precision, you are just fighting round-off errors by iteration 10.
All of this assumes that the answer is even meaningful. Log likelihood = - 2 million does not seem meaningful if you really mean "log likelihood".
Not sure what you mean by quadratic convergence (unless you just mean literally following a quadratic function). The model is a multinomial logit model with a bunch of crap in it. I tried some others just to get the screenshot but they actually optimized too quickly! So I had to pick one that I knew would take a long time.
With multinomial logit you're multiplying a bunch of small probabilities together for the likelihood function, so that likelihood will be relatively small for any given solution and the log-likelihood will be large and negative.
Stata has default tolerances for when to stop optimizing. It may be getting into rounding errors at that point, but I would think by the time it ran into rounding errors the tolerances would tell it to stop.
Quadratic convergence is when you have an iterative method and your error in effect squares with each iteration so you might have a sequence where the first iteration has an error of .1, the second iteration the error is .01, the third iteration you have .0001 and the fourth iteration the error is .00000001. If your method and your problem permit quadratic convergence it's awesome. That is probably what is going on with your other cases. Newton's method for finding a root of a function will display quadratic convergence once the estimate is close to a root.
That was my computer screen yesterday! I maximized the shit out of that likelihood function.
ReplyDeleteAlso, potentially relevant: http://xkcd.com/303/
Oh I am TOTALLY adding that to my slides...
Delete...or would that be unprofessional? I like to keep things somewhat lighter and sometimes I don't know.
Yep, added it to the first slide with the picture above. Went over very well :)
DeleteMost of the presentations have been on empirical papers which are more interesting because they have a story and neat identification tricks. So it was nice to have this at the beginning of mine, since mine was one of only two presentations that was more about textbook type material.
That's exactly what my computer screen looked like when I tried to write an agent-based simulation of the indulgence market. Also pretty much every single ones of my forays into simulations, machine learning and my day to day job: compiling, running tests, running the flaky tests again...
ReplyDeleteI have a good link for your slides: http://oneweirdkerneltrick.com/
Haha - I like "37-year-old patriot discovers "weird" trick to end slavery to the Bayesian monopoly."
DeleteWhen I first clicked through I thought you gave me a bad link - then I realized, nope - this is it.
Looks like you have quadratic convergence for the first two iterations and then it falls off pretty quickly. If any part of the code is written in single precision, you are just fighting round-off errors by iteration 10.
ReplyDeleteAll of this assumes that the answer is even meaningful. Log likelihood = - 2 million does not seem meaningful if you really mean "log likelihood".
Not sure what you mean by quadratic convergence (unless you just mean literally following a quadratic function). The model is a multinomial logit model with a bunch of crap in it. I tried some others just to get the screenshot but they actually optimized too quickly! So I had to pick one that I knew would take a long time.
DeleteWith multinomial logit you're multiplying a bunch of small probabilities together for the likelihood function, so that likelihood will be relatively small for any given solution and the log-likelihood will be large and negative.
Stata has default tolerances for when to stop optimizing. It may be getting into rounding errors at that point, but I would think by the time it ran into rounding errors the tolerances would tell it to stop.
Quadratic convergence is when you have an iterative method and your error in effect squares with each iteration so you might have a sequence where the first iteration has an error of .1, the second iteration the error is .01, the third iteration you have .0001 and the fourth iteration the error is .00000001. If your method and your problem permit quadratic convergence it's awesome. That is probably what is going on with your other cases. Newton's method for finding a root of a function will display quadratic convergence once the estimate is close to a root.
DeleteThis comment has been removed by the author.
Delete