Monday, August 29, 2016

Bail?

So, I'm considering bailing on the Q. Work has just been so much lately, I know it won't be my best shot. I asked the question as to whether that was even feasible today. Apparently, it is, provided you get adviser approval. I'm pretty sure my adviser would approve that. I'll at least find out.

I haven't given up yet, but another 4 months would make a world of difference (assuming I didn't squander them).

Friday, August 26, 2016

Perfect storm

Well, this is what I was afraid of and it's come to pass: work demands increasing at a time when I really need to be focused on school. The unfortunate thing is that this is, literally, the last time this could have happened.

I'm basically done with coursework. Yes, I have classes, but they are directed study, so there's flexibility as to when things get done. Yes, there's the A and B exams but, those can don't require the kind of comprehensive prep that the Qualifier demands. But, this is the hand I've been dealt. I'm up working past midnight when I should be either sleeping or studying.

I can't blow off work. That's simply an economic reality. I really don't know what I'll do if I fail the Q. It would be quite a blow, to say the least.

Thursday, August 25, 2016

Back to school (sort of)

Classes started this week. I'm taking one formal lecture class (CS5750 - Cloud Computing) and one directed study (MA5500 - Topics in Financial Data Analysis). The first looks like it will be pretty easy (even the professor says so), but the system I support at work is slated to go that direction so I have to learn the stuff anyway. I might as well pick up 3 credit hours while I'm at it.

The latter is really just a cover for continuing my research. I'd like to publish a more "mathy" paper this year to follow the purely technical one from last spring (though, I still have a bit of work to do on that one, too). My thought is to prove some theorems explaining why the stratification algorithm is needed. Another offshoot might be a better prior distribution or maybe a family of priors based on assumptions about what's driving the heavy tail on the metrics. That last part might be a bit ambitious, but it never hurts to be looking for opportunities.

Speaking of opportunities, we finally got the Scalable Data Delivery project approved at work, so I'll be porting all our data to Hadoop. That gives me a really nice platform for running my algorithms. I spent all yesterday afternoon grinding out the budget for that effort, which meant I missed class. I should probably not make a habit of that going forward.

Tuesday, August 9, 2016

This is why you take the Q

... to keep from making a fool of yourself when you start publishing stuff. Sometimes the significance of lessons learned in the context of a class doesn't really become apparent until you review those lessons with an actual problem in mind.

Readers of this blog know that I've been grappling with why you can't make sense of random samples from financial data. The actuaries have been telling me this for years. My own research has confirmed it. But, why? Why has the Central Limit Theorem abandoned me?

Simply put, because I'm dividing.

Financial data is all about ratios. Return on Investment, Return on Economic Capital, Present Value, Future Value, Annual Percentage Yeild, etc.. Even values that look like they wouldn't be ratios actually are. Reserve Capital looks like just a regular asset, but it's not. It's based on Amount at Risk times a bunch of factors based on various risk parameters.

And it's that last little piece that gets ya.

Taking a sample of, say, policy face values, is pretty safe. The distribution is a little skewed, but it's certainly no problem to take a sample average or standard deviation. You don't have to sample too many to get a convergent estimate of the mean. Multiplying that by a fixed interest rate to get a future value is simply applying a linear transformation, which will also yield a nice, convergent mean.

But, what about those pesky risk factors? They could be all sorts of things and the rates applied to cover reserves will vary with each. Let's suppose both the amounts and the rates are coming from normal distributions (which is an unrealistically best-case scenario to be sure). What happens when you divide them to get your reserve requirements?

Well, in just about any decent text on distributions, you'll find an offhand mention of what happens when you divide two Normals. It's the sort of thing you just glide over in class figuring it won't likely come up much in real life. The result is a Cauchy distribution - a distribution so heavy-tailed it doesn't even have a mean, much less the higher moments needed to get the Central Limit Theorem to work.

None of this has solved my problem, but at least I can stop stressing over why I even have one.

Sunday, August 7, 2016

Carol's Fat Ass 2016

Run August 6, 2016.

A good number of runners like to "run their age" on their birthday. This is usually done in miles, but kilometers is an equally acceptable unit. I use kilometers because 1) it's August in St. Louis and 2) even if it's not 100 degrees out, it's usually easier to block out a whole morning than a whole day.

In ultrarunning, a "Fat Ass" is an event where the race directory basically doesn't do any work. No t-shirts, no prizes, minimal on-course support, and (importantly) no entry fee. It's basically a group run with a name.

Carol was always pretty good natured about being pear shaped, so when I decided to make my birthday run an annual group event in her memory, the name pretty much suggested itself. We've had as few as 5 and as many as 35 show up for the previous five editions. I've finished every one, but most people are just out to enjoy the company and cut the course short to meet their own training priorities. Still, this is the ultra crowd so "cutting it short" usually means at least marathon distance. There are usually enough folks still around after 5-6 hours to have an impromptu picnic (with birthday cake) afterwards.

Because the course necessarily changes every year to add another K, I usually look for what I think is the coolest new bit of trail in St. Louis and build the course around that. This year, it wasn't really trail, but rather the new bike/pedestrian lane on the Boone Bridge over the Missouri River. This is a pretty big deal as the next closest bridge is 10 miles downstream.

This years route crosses the bridge, does a lap around Howell Island and then comes back to finish with a lap around Lost Valley. When the Missouri rises 10 feet in a single night three days before the event, I have to fall back on my "high water" route, which trades the now inaccessible Howell Island for a lap of the considerably smaller conservation area at Big Muddy. To make up the distance, I tack a lap of the Lewis trail on the very end. The distance still comes to almost exactly 53K.

Map of route

A dozen runners show up this year (plus two more who went out early). As usual, we don't try to keep everyone together but instead break into several groups. The front group is Tommy and Jen Doias, Zdenek Palecek (better known as just "Z"), Greg Murdick, and myself. We stay together until the water drop at mile 17, when Z and Greg decide to head straight back rather than running the loop of Lost Valley.

The course up to this point has been all bike path (both paved and gravel) and fairly flat. As such, we've been running a bit quicker than I'd normally go on a long run in this heat. I suggest to Tommy and Jen that we bump it back a bit now that we're switching to singletrack and get no argument.

While that does make the going easier, it also raises the typical hazard of tripping when you suddenly slow your pace on technical trail. Both Tommy and I stumble several times, but manage to stay upright. Back at the cars, we decide to take a ten minute break to recollect ourselves before tackling the final five miles on the very technical Lewis trail. Feeling better after the short rest and some refreshments, we get through the loop without incident.

We don't hang out for too long afterwards as there isn't any shade at the start/finish area (I probably should have thought that through better). Still, we do stick around long enough to share some birthday brownies and get a shot of our "official" finishers.

Wednesday, August 3, 2016

Missing the point

Blogging about work is always dicey as one wants to be candid but their are very real and negative consequences to biting the hand that feeds you in full view of the world. Let me also be clear that my current client is one of the best run companies I've ever worked for (and, as a lifelong consultant, I've worked for a lot of 'em).

That said...

We've got this quarterly meeting scheduled today which is typically used to keep everybody up to speed on what is going on in our section of IT. Tech leads and managers give short pitches on what their groups are doing and what's coming in the relatively near future. It's not the most exciting two hours, but it is useful information.

Meanwhile, the observation has been made that, when attending a conference, it's often the hallway conversations that transfer the most usable information. So, why not make the quarterly meeting a more informal walk-around affair and just let people swap information freely?

That sounds like a great idea, but there's a really important omission among the predicates: this isn't a conference. The people we'll be milling with are the same folks we see in the hallways every day. We don't need a quarterly meeting to have hallway conversations.

Maybe they'll at least have some good snacks.


Monday, August 1, 2016

Counting and the Beta function

Yes, I suppose I should get back to writing about math.

So, I'm looking through some basic probability stuff ahead of the Qualifying Exam. I'm just dusting it off; not expecting any revelations. That's pretty much how it went, but I did come across a curiosity. The text has a table summarizing the counts for the sample space for the four ways items can be sampled. It looks like this (with nicer formatting):

Without
Replacement
With
Replacement
Ordered
Unordered

Perhaps it's just because I've been working with the Beta function a lot lately, but I had never noticed that when sampling with replacement without respect to order, the count can be expressed as r/B(r,n). It's not immediately obvious to me that there's any significance to that. After all, the Beta function is just a statement about combinatorics, so you could work it into a lot of counting formulas. But, step back from that and look at the formula and what it really means.

What we're saying is that if I was to randomly pick r balls out of a bag of n balls, replacing each time and you were betting on the sequence I'd produce, your odds of winning would be:



Is there any remotely intuitive reason to think that would be the answer?