never2old4school: October 2016

Monday, October 31, 2016

Multivariate transforms

Recall from the univariate case that if Y = g(X) and g is monotonic, then the pdf of Y is:

f_Y(y) = f_X(g^-1(y))|d/dy g^-1(y)|

Extending this to the multivariate case is pretty straightforward when g is a 1:1 function. The kicker is what to do with the derivative. You now have to adjust for the transforms along each dimension of g. This obviously suggests some sort of partial derivative. Indeed it is, but not exactly an obvious one. The extension of the derivative of a scalar function of one variable to one of many variables is the gradient. Extending that to functions of vectors produces a matrix known as the Jacobian:

$J = \begin{bmatrix} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_1}{\partial x_2}&...&\frac{\partial y_1}{\partial x_n}\\ \frac{\partial y_2}{\partial x_1} & \frac{\partial y_2}{\partial x_2}&...&\frac{\partial y_2}{\partial x_n}\\ ... & ... & ... & ...\\ \frac{\partial y_n}{\partial x_1} & \frac{\partial y_n}{\partial x_2}&...&\frac{\partial y_n}{\partial x_n}\\ \end{bmatrix}$

The matrix doesn't have to be square, but if it is, then the determinant is also called the Jacobian and it's what you stick in to do the transform of the density function. You can see how this reduces to the univariate form when n = 1. Of course, what we need for that formula is the Jacobian of the inverse of g because we're starting with the value for Y and mapping it back to the density of X. This is why it needs to be a 1:1 function.

In the cases where it isn't, we use the same trick employed in the univariate case: break the sample space up into areas where the function is 1:1. If A = {A_i} is the partitioned sample space and h_i(y) = g^-1(y) for all y in g(A_i), then:

$J_i = \begin{vmatrix} \frac{\partial h_{1i}(y)}{\partial y_1} & \frac{\partial h_{1i}(y)}{\partial y_2}&...&\frac{\partial h_{1i}(y)}{\partial y_n}\\ \frac{\partial h_{2i}(y)}{\partial y_1} & \frac{\partial h_{2i}(y)}{\partial y_2}&...&\frac{\partial h_{2i}(y)}{\partial y_n}\\ ... & ... & ... & ...\\ \frac{\partial h_{ni}(y)}{\partial y_1} & \frac{\partial h_{ni}(y)}{\partial y_2}&...&\frac{\partial h_{ni}(y)}{\partial y_n}\\ \end{vmatrix}$

and

$f_Y(y) = \left\{\begin{matrix} \sum f_X(h_i(y))|J_i| & y\in g(A) \\ 0 & \textrm{otherwise} \end{matrix}\right.$

Note that |J_i| is the absolute value of the Jacobian determinant, not simply the Jacobian determinant (hey, I'm not the one who decided to call them the same thing and then use ambiguous notation on top of it).

Sunday, October 30, 2016

ALS

I got a really unpleasant phone call from friend this week. As he hasn't made this information public, he'll remain nameless for the time being. He has ALS. Or, something very much like it; there will be some further tests to get a more precise diagnosis, but there's little question that it is a motor neuron sclerosis of some sort, most likely terminal.

Folks who remember my old blog know that I've been down this road before. My sister died of ALS a decade ago. It's a pretty helpless feeling. Of course, I can help with the mundane things that become difficult as the disease progresses. More importantly, I can just be there. But, there isn't anything I can do to fix it.

Or is there?

I'm not looking to play the hero, here, but it might not have been entirely coincidental that the day after getting this call, I attended an interesting colloquium by an UMSL prof who is doing some significant research in identifying the genetics that drives a lot of this. St. Louis is a bit of a hotbed for this sort of thing. Washington University is one of the original players in the Human Genome Project and Monsanto, while controversial, does some pretty amazing things with genetic modifications. The UMSL professor who gave the talk, Dr. Climer, is collaborating with faculty at Wash U.

The problem plays very much to my strengths. It's basically an Operations Research problem that is solved by throwing hardware at it. I've got plenty of experience with both the genre and the approach. I don't want to completely abandon my current research, but there may be an intersection that can be exploited. At any rate, it's worth at least exploring.

Saturday, October 29, 2016

Binomial and Multinomial theorems

Ya wouldn't think counting would be so hard, but it can be. Back in August I posted about how the number of possibilities changes based on order and replacement. When selecting r items out of n and order isn't important, we get n! ways. To order the set you're selecting from, divide that by (n-r)! since you don't care about the ordering of the items you didn't take, and further divide by r! because you don't care about the order of the things you did take, either. The result is what's termed "n things taken r at a time", typically shortened in speech to just "n choose r" and (since mathematicians aren't big on using words at all if they can help it) this gets written as:

$\binom{n}{r}$

Ok, I expect that most of my math-oriented readers already knew that. A fairly intuitive result arises when you look at the expansion of the polynomial (x+y)ⁿ. Each term in the resulting expansion will be of the form xⁱy^n-i where i can range from 0 to n. The coefficient is just the number of ways you can choose i x's from the possible n terms in the original, that is n choose i. So,

$(x+y)^n = \sum_{i=0}^n\binom{n}{i}x^iy^{n-i}$

This result is known as the Binomial theorem, mainly because it matches the pmf of the Binomial distribution. That is, if Y is a random variable representing how many times an event occurred in n independent trials with success rate p:

$P(Y=y|n,p)=\binom{n}{r}p^y(1-p)^{n-y},\qquad y=0,1,...,n$

Generalizing this to the multinomial case gives the Multinomial distribution where Y is now an m-dimensional vector of counts for m possible outcomes in n trials giving:

$P(Y=y|n,p)= \begin{cases} \frac{n!}{y_1!...y_m!}p_1^{y_1}...p_m^{y_m},\qquad &n=\sum y_i\\ 0&\textrm{otherwise} \end{cases}$

and the Multinomial theorem:

$(p_1+...+p_m)^n= \sum_{y\in A}\frac{n!}{y_1!...y_m!}p_1^{y_1}...p_m^{y_m}$

where A is the set of all vectors of non-negative integers where the sum of the components is equal to n.

Friday, October 28, 2016

Miscellaneous multivariate independence results

No named results here; these results could certainly be used without reference in any paper as they are foundational.

Multivariate independence: A set of random variables X = {X_i} with joint pdf or pmf f(X) is independent if and only if there exist functions f_i(x_i) such that f(X) = Π f_i(x_i). In such cases, f_i(x_i) are the marginal pdfs or pmfs for the elements of X.

Functions of independent random variables: If X and Y are independent random variables, then g(X) and h(Y) are independent.

Products of independent random variables: If X and Y are independent random variables, then E(g(X)h(Y)) = E(g(X))E(h(Y)).

Moments of independent random variables: If X and Y are independent random variables having moment generating functions M_X(t) and M_Y(t), then the moment generating function for Z = X+Y is M_Z(t) = M_X(t)M_Y(t).

That last one suggests a very important specific result, that the sum of two independent normal random variables is a normal random variable. The actual result can be stated stronger and extended to any linear combination of independent normals: If X_i are independent normal random variables with means μ_i and variances σ_i², then X=Σ(a_iX_i+b_i) is a normal random variable with mean Σ(a_iμ_i+b_i) and variance Σ(a_iσ_i²).

Thursday, October 27, 2016

More basic multivariate results

This shouldn't come as any surprise:

Cov(X,Y) = E(XY) - μ_Xμ_Y

Why not? Well, consider what happens when we let Y = X.

Cov(X,X) = E((X-μ_X)(X-μ_X)) = E((X-μ_X)2) = Var(X) = E(X²) - μ_X²

That's certainly not a proof but, given that covariance is basically an extension of variance to multivariate random variables, it is to be expected. Factoring out linear combinations also works the way you'd expect:

Var(aX+bY) = a²Var(X) + b²Var(Y) + 2abCov(X,Y)

Again, not a proof, but if you let Y=X, you get:

Var(aX+bX) = Var((a+b)X) = (a+b)²Var(X) = a²Var(X) + b²Var(X) + 2abVar(X)

Remembering what happens in the univariate case is an easy mnemonic.

Wednesday, October 26, 2016

Covariance and correlation

Just a real quick post today as I spent all evening working on my paper.

Given two random variables, X and Y, the covariance of X and Y is

Cov(X, Y) = E((X-u_X)(Y-u_Y)).

The correlation of X and Y (also called the correlation coefficient) is

ρ_XY = Cov(X, Y)/σ_X σ_Y They're just definitions, so there's not really anything to understand. Very important to know them, though.

Tuesday, October 25, 2016

Conditional expectations and variances

Well, that's all well and good to have a theorem relevant to my research, but we need to get back to Q topics. I promised myself I'd get through Probability Theory by the end of the month.

If X and Y are random variables then:

E(X) = E(E(X|Y)), provided the expectations exists.
Var(X) = E(Var(X|Y)) + Var(E(X|Y)), provided the expectations exist.

The proofs for both of these are largely symbol manipulation exercises, but the results are worth knowing. Both results are necessary to prove the blocksum theorem I stated yesterday.

Monday, October 24, 2016

General formula for variance

You may have picked up on the pattern between variance 1.01, 2.0, and 3.0. As it's a general result, I'm going to state it as a theorem:

Let X be a random variable with finite mean and variance having pdf f(x|λ), where λ is a real-valued vector of parameters from a prior distribution with finite mean such that E(X|λ) = g₁(λ) and E(X² |λ)= g₂(λ). Let X' be a random variable such that X'~X with probability θ and X'=0 otherwise where θ~Beta(a,b). Then:

E(X') = μ = aE(g₁(λ))/(a+b)

and

Var(X') = σ² = (bμ² + a[E(g₂(λ)) - 2μE(g₁(λ)) + μ²])/(a+b)

Note that, if using a "standard" distribution for a prior, it is probably easier to compute g2 indirectly, that is:

g₂(λ) = Var(X|λ) + g₁(λ)²

The proof is pretty much the derivations we've gone through before, up to the point the where we actually plug in g(λ). When g(λ) is a linear combination of the elements of λ, (e.g., normal, t, uniform, among others), the substitution becomes trivial. If not, (e.g., exponential, where g(λ) = 1/λ), you either have to hope the transformed distribution g(λ) is itself a known distribution (inverse exponential in this case), or you have to actually crank out the first two moments.

We can demonstrate the usefulness of this result by plugging in the uniform and exponential assumptions to re-derive the variances we've seen so far:

Uniform(0, L_k): (I'm using L_k to indicate the upper limit for a blocksum rather than nb_k as before. I think it makes the notation less cluttered and it removes ambiguity around the variable b.)

Here, λ = L_k is a constant, so the expectations are just the constant values:

$\begin{align*} &\textrm{E}(g_1(\lambda))=L_k/2\qquad \textrm{and}\\ &\textrm{E}(g_2(\lambda)) = \textrm{E}(\textrm{Var}(X|\lambda)+g_1(\lambda)^2)=\textrm{E}(L_k^2/12+L_k^2/4)=L_k^2/3 \end{align*}$

Thus:

$\begin{align*} \mu&=a\textrm{E}(g_1(\lambda))/(a+b)=a L_k/2(a+b)\\ \\ \sigma^2&=(b\mu^2+a[\textrm{E}(g_2(\lambda))+2\mu\textrm{E}(g_1(\lambda))+\mu^2])/(a+b)\\ &=(b\mu^2+a[L_k^2/3+\mu\L_k+\mu^2])/(a+b)\\ \end{align*}$

Granted, that's not much easier than before, but when we move to a distribution with a prior on λ, the theorem does help point the way. Let's move on to the exponential distribution. Recall that here we model λ~Gamma(h,1/s)

$\begin{align*} &g_1(\lambda)=1/\lambda\qquad\textrm{and}\\ &g_2(\lambda)=\textrm{Var}(X|\lambda)+g_1(\lambda)^2=1/\lambda^2+1/\lambda^2=2/\lambda^2 \end{align*}$

A change of variables where β = 1/λ~InvGamma(h,s) gives

$\begin{align*} \textrm{E}(g_1(\lambda))&=\textrm{E}(\beta)=s/(h-1)\qquad\textrm{and}\\ \textrm{E}(g_2(\lambda))&=\textrm{E}(2\beta^2)\\ &=2\textrm{Var}(\beta)+2E(\beta)^2\\ &=\frac{2s^2}{(h-1)(h-2)}+\frac{2s^2}{(h-1)^2}\\ &=\frac{2s^2(2h-3)}{(h-1)^2(h-2)} \end{align*}$

The restriction that h > 2 on the variance of the Inverse Gamma that we saw last week still applies. And now, we just plug it in:

$\begin{align*} \mu&=a\textrm{E}(g_1(\lambda))/(a+b)=\frac{as}{(a+b)(h-1)}\\ \\ \sigma^2&=(b\mu^2+a[\textrm{E}(g_2(\lambda))+2\mu\textrm{E}(g_1(\lambda))+\mu^2])/(a+b)\\ \\ &=\frac{b\mu^2+a\left[\frac{2s^2(2h-3)}{(h-1)^2(h-2)}+\frac{2\mu s}{h-1}+\mu^2\right]}{a+b}\\ \end{align*}$

The hardest part of this is remembering which random variable you're taking the expectation of. The "inner" expectations g₁, g₂, are on X, the block sum. The outer expectations are on the distribution of the parameters. As long as you keep that straight, it's formulaic.

I could also note that, had we been plowing ahead with the integration by parts, the restrictions on h in the above example might not have been obvious until we'd worked on it a while. In contrast, by using this theorem as a guide, we are pointed immediately to the transformed distributions where, assuming we're looking them up rather than computing them ourselves, such constraints are immediately apparent.

Sunday, October 23, 2016

New York Marathon, 2009

It's been a while since I've posted an off-day throwback race report. Here's NYC in 2009.

Run November 1, 2009

Being a big fish is all about finding the right pond. One such pond for me is the Bubba Pre-Memorial Cross Country Series (as I'm already on a tangent, I won't bother explaining the name). It's not that St. Louis doesn't produce good cross country runners; it produces some of the best. Perhaps even the best as Craig Virgin is from just across the river in Illinois. But, for whatever reason, those kids go on to other things (Congress, in Virgin's case), leaving the 40+ field to folks like me. However, to maintain one's big fish status even in a tiny watering hole like this, one must head off to the deep dark ocean from time to time and return with tales of heroism and woe.

So, here I was talking to Andy Koziatek at Bubba #2, a week before running the New York Marathon. Andy would be a mid-sized fish even on the stage of New York; his times would put him in the "sub-elite" class. My 3:09 qualifier puts me in the minnow category, but still a notch above the great masses that get in via the lottery. I'd be labeled a "locally competitive" runner and given a better start position at the front of the "orange" start if I was local, but St. Louis is not exactly a suburb of NYC, so I'm in the big group on the "blue" line behind the elites. (More on New York's convoluted wave and start line procedures in a bit). Andy wishes me well, noting that my terrain running should have me ready for the hills, and then boosts my confidence by crushing me in the cross country race. But, he's still a young pup, so I'm still the big fish in the over-40 pond.

The trip to New York is happily uneventful; never a sure thing when driving a thousand miles with a 6-year-old in the car. We arrive in Norwalk CT at the home of my college roommate, Kevin Robertson. Kevin works at a midtown design agency and is the creator of the Carol's Team logo. He's also been getting into running the past few years and starting to talk seriously about running his first marathon. Talking about his progress is a nice diversion from thinking about my own prep.

On Saturday, Kevin, Yaya, and I take the train into the city to pick up my number (Kate would have come, too, but she's nursing a cold and decides to get some rest). Inside the Javitz center, we get our first look at the machine that is the New York City Marathon. It's the sort of efficiency that New York is famous for (the most time consuming part of the process is weaving through the equipment expo to exit), but it's also administered by some of the most genuinely friendly folks you'll ever meet.

Yaya was given the choice between the Statue of Liberty and the Museum of Natural History and chose the latter to see the dinosaur bones. We walk over to Penn Station and catch a subway uptown. Kate's sister Emily (who lives in Brooklyn) meets us for lunch and accompanies us to the museum. I haven't been to the museum in 30 years, but I spent enough time there as a kid to leave a lifelong impression. It's been updated in many positive ways but, thankfully, the bones are exactly as I remembered them. No animatronics, laser shows, or interactive media. Just bones assembled into some really menacing poses. If a 65-million-year-old skeleton that's bigger than a truck ain't good enough, well, that's your problem. It's certainly good enough for Yaya who sheiks with glee at seeing them.

My feet are starting to send signals that I'm doing way too much walking around for the day before a marathon. We grab a cab back to Grand Central (Yaya very much enjoys her first taxi ride) and head back home so Yaya can do some trick or treating and I can get some sleep. At 5:30 AM, I'm heading back in for the race.

I arrive at Grand Central at 6:45 and hunt around for a restroom. Most are still closed, but one of NY's Finest points me to one that's open. You'd think that there would be no wait at this time on a Sunday, but the odd confluence of runners taking care of pre-race business and Halloween revelers taking care of post-party business has the lone restroom rather full. It's an odd mixing of groups to say the least: one wants to run 26.2 miles, the other just wants their head to stop hurting.

I miss the 4 train downtown because they've got it diverted to the local track for service. ("Should we be getting on that one?" asks a woman from England. "No, we want the express," I say confidently, trying to affect my old NY accent which was never very strong and wore off years ago. "Oh wait. Crap. We wanted that."). We get the next one and arrive at the Battery just as the 7:30 ferry is backing out of the pier. Still 2 hours to the start, so I'm not too worried about it. The ferry terminal is pretty full, but another friendly New York Road Runners Club (NYRRC) volunteer tells me they are running the normal commuting schedule rather than the weekend schedule, so there will be no trouble getting everybody across the river. Sure enough, the 7:45 fills up before I get to the gate, but another one arrives just 5 minutes later and I get on that. During the ride, I somehow get tasked with taking pictures of a dozen international runners who want the Statue of Liberty in the background. I wish Kate was with me; she takes much better pictures than I.

Once on Staten Island, we're shuttled onto busses that take us to Fort Wadsworth for the start. There, we have to go through a security checkpoint, deposit our drop bag at the appropriate UPS truck (each truck has a sign indicating a range of 1000 bib numbers, so there are 67 such trucks), and find my way to the corral for Blue, Wave 1.

New York used to have one mass start with the elite men on one side of the bridge leading the pack of competitive runners and elite women on the other side leading the 3:30+ crowd. A few years back they went to a wave start with three waves and three start lines to try to cut back on congestion. Blue-1, has the elites, followed by the next fastest 5000 or so runners who didn't qualify for a preferred spot on the Orange or Green lines. I'm seeded about halfway deep in this group, which means I can expect to lose some time in the first few miles due to congestion.

All that suddenly changes when I hear an announcement over the PA saying that Wave 1 corrals are now closed. What?!? It's only 9:00 AM; the start isn't for another 40 minutes! I run over to the corrals hoping I can sweet talk my way in. That turns out to be unnecessary as a large group of runners has solved the problem in a manner more typical of New York: they've knocked down part of the fence and are streaming into the corrals. Well, when in Rome... I join the mob and work my way through the start grid until I find people with numbers in the same range as mine.

Then comes the big cattle drive onto the bridge as all the corrals are moved to the staring lines. It's a fair distance covered and there's plenty of jostling as several thousand runners are moving along a 30-foot wide roped corridor. I seem to do pretty well at holding my position and as we approach the line, I'm only about 1000 runners deep. There are double-decker busses lined up along the starting grid that serve both to keep us penned in and also provide VIP vantage points. While I may not be thrilled with my start position, it is fun to be on the line where the elites are introduced. It's a Who's Who of marathoning, made all the more exciting by the fact that there are some Americans in the field with a legitimate shot at winning the race. I've had no warmup at all, so I try to jog in place to get some blood flowing during the intros. The national anthem is sung, Mayor Bloomberg gives a wonderfully concise sendoff and then the howitzer fires, sending a shock wave through my chest and the first wave of 20,000 runners into motion.

Blue-1 Start

It's a tight squeeze through the start line which I hit about 40 seconds after the gun, but the chip timing will adjust for that. After that, the Blue wave gets all three eastbound lane, so there's adequate room to run. It appears my grid position was appropriate. A few people pass me and I pass a few, but most of us settle into the opening grunt up the bridge at about the same speed. We climb steadily to the first mile marker which passes 7:47 after crossing the start line. I expected to lose time in the first couple miles, so I'm fine with that split. The next mile (which is now running down the other side) is 6:31, so I've cleared the first big hurdle, have reasonably open running, and am less than a minute off pace. My pulse has settled in at 156 with no spikes, so it's as good a start as I could have hoped for.

Off the bridge, we now meet the spectators. Two million is one of those numbers that you think you understand, but can never really get your arms around. That is, until you run past that many people in a three-hour period. The side of the road is jammed with people and the noise is something I've only experienced at the finish of a race. And here we are barely 1/10 of the way in. As we turn onto Brooklyn's Fourth Avenue and merge with the Green and Orange starts, I get another sight I'm not used to: the backs of nearly 3,000 runners filling the road for over a mile in front of me. This is clearly not the Bubba Cross Country series.

Fourth Avenue stays pretty close to the river, so the hills are small and gentle. I settle into my 6:50 pace, figuring that if I'm going to break three hours, it will more likely happen with a strong second half than by trying to get the first-mile losses back here. It's pretty much ideal running weather: 50 degrees, overcast, and very light wind. My legs feel fine and the effort feels right. There's just one problem. I have no food.

I haven't brought any food because I just assumed they'd have some at the aid stations. While they do have both Gatorade and water, there's nothing solid being offered. That's a problem. I had a decent breakfast at 5AM, but haven't had a chance to eat anything since. I take Gatorade at each station (they are more or less at each mile), but I was counting on putting down around 600 calories during the race and I don't think the little 4-ounce sips I'm getting are going to add up to that. Further, because it's so nice and cool, I'm not losing much fluid so it won't get absorbed as quickly. One thing you definitely don't want in a marathon is a lot of salt water making it to your large intestines.

The middle of the field in Brooklyn; this is why you don't want to be in wave 2

At 8 miles we turn away from the river and get into some larger hills. I manage to keep my pace close without spiking my heart rate. All the cadence training I've been doing the last few months is paying off. The shorter stride definitely works better on the hills. Runners who have taken it out too fast are already starting to falter and I'm passing several dozen people on each incline. Of course that still leaves several thousand up the road.

Heading into Greenpoint at mile 12, I try to pick out Emily among the thousands along the road. I don't see her, but I do spot a spectator offering bananas to the runners. It's not my normal marathon fuel, but I've eaten them in long races before, so I grab one. It's a little green, but seems OK. By mile 13, the bomb has gone off. The banana shot right through my empty stomach and now I gut is ready to explode. I back the pace off, running mile 13 in 7:03 despite it being slightly downhill. It doesn't help. As I start up the Pulaski Bridge into Queens, it's clear I'm not going to be able to finish unless I take care of this.

My problems are far from unusual and the NYRRC has prepared for such things by positioning hundreds of porta-potties along the route. Being relatively close to the front of the field, they are all still clean and stocked with paper; I shudder to think what shape they will be in when the 6-hour crowd is coming through. It's only a 90-second stop and I'm back on my way feeling much better. Still, the damage is done. I'm now nearly three minutes off pace. Running the tough second half in 87 minutes is simply not plausible. Today will not be a sub-3. So, it's decision time: hammer on and try to get the best possible seed time for Boston (probably around 3:04 the way things are going), or back off and enjoy being part of the greatest spectacle in running. I choose the latter.

Still, it is a race, and I don't want to completely give up at this point. I run mile 15 in 7:28 to let my insides fully settle and then get back on the gas. We pass an aid station where a guy is on the horn telling us all that this is the last water before "The Bridge". I look up, way up, to see "The Bridge" ahead of us. The Queensboro Bridge is the steepest, nastiest climb on the course. There are no spectators allowed on the bridge and for the first time in over an hour, I hear running sounds, breathing, footsteps, spitting, coughing, rather than the constant cheering. While some might feel abandoned in their time of need, I rather like it. There's a seriousness to the silence that makes the hill easier to deal with. Running back down the other side is so steep that it requires substantial effort just to keep from pitching forward. The 16-mile marker is just before exiting the bridge and I'm rather surprised to see that, even with the hill, I've run the mile in 7:16. Holding that pace the rest of the way will have me comfortably under 3:10.

Up until this point I've been marveling at the spectator support. Despite being several miles behind the leaders, the fans have been abundant and enthusiastic. But nothing so far has prepared me for the entrance to New York Island. The base of the Queensboro Bridge is a prime viewing area for a number of reasons. It's far enough in that you see the runners starting to wrestle with the effort. The 180-degree turn at the base of the hill has even slower runners leaning in which makes for good photos. You can walk just four blocks to watch the runners again at mile 25. But, most of all, it's the first step taken in New York County, and if you're from these parts and you say "New York", you mean the Borough of Manhattan. The sound is deafening, especially in contrast to the silence that preceded.

The roar continues the entire length of First Avenue, which is a straight drag of over three miles. There are slight changes in grade, but nothing constituting a hill. At mile 18 I get to an aid station and have to chuckle a bit at the fact that they are handing out PowerGel. Anybody who waits until mile 18 to eat in a marathon is toast. It's the only piece of logistics that the NYRRC got wrong. That doesn't stop me from grabbing a few packets. Even at this speed, I'm going to be pretty low on fuel by the finish.

One of the design constraints of the course was that it should hit all five boroughs. Obviously, Staten Island gets short changed as we're on the bridge leaving that one at the start. Nearly half the race is in Brooklyn, which makes sense as that's where most of the people live. It simply has to finish in Central Park for both aesthetic reasons and others that will become obvious later. Queens gets a few miles, but the one that I'm sure they'd delete if they could would be the Bronx. There's only enough length in the course to get a mile over there and it's necessarily at the very southern end which, frankly, is not the most flattering part of the city. In fact, when I was a kid, my dad brought me here and told me that I'm no better than any of these folks and if I didn't want to live like this getting some decent grades might be a really good idea. It worked.

Anyway, the Bronx it is. There aren't very many fans here (even if you're not afraid of the hood, why would you watch here when you can join the biggest rap party on the planet a mile away in Harlem?) Stranger still, they're blasting the Rocky theme at the aid station. Aren't the Bronx Bombers currently trying to defeat the Phillies in the World Series? Maybe it's been long enough that folks here don't even remember where that movie was set. I guarantee you they do in Philadelphia. Anyway, I'm happy to be through it and as we cross the Madison Avenue Bridge, we are greeted by the aforementioned largest rap party in the world. This isn't amateur hour, either. Some of these guys are really good.

At mile 22, we turn onto Fifth Avenue and the silhouette of the Empire State Building, five miles away, is planted firmly in our sites. Perhaps it's because it wasn't finished until I was 10, but the World Trade Center never really interested me. It was just tall. The ESB, on the other hand, symbolizes everything I love about the city. Seeing it in this context literally sends shivers down my spine. I spend the next 2 miles staring at it.

At mile 24, we turn into Central Park and are reminded that Manhattan is Dutch for "Island of Hills". It's a brutal ending. None of the hills are long, but there is simply no flat ground to be had. Aside from the opening climb and disaster miles of 14 and 15, 24 and 25 are my slowest miles of the race, but it's not for lack of effort. Exiting the park onto Central Park South puts the hills behind us and the cheers are as loud here as they have been anywhere on the course. The distance markers are now counting down, rather than up: 1 mile to go, 800m to go, we turn back into the park, 400m to go, 200m to go, and there's the finish banner.

Of course he's not tired; he only ran for two hours

And it's done. 3:08:40. While I might have wished for better, it is second only to my 3:00:16 at Pensacola (a much faster course), so it can hardly be called a bad personal performance. I'm not quite sure where it will land me on the starting grid at Boston, but my guess is that it won't be too much different from what I had to deal with here. Boston usually gets around 4000 runners with qualifying times under 3:10. On patriotic note, Meb Keflezighi brought home the win for the US - first time that's happened in quite a while. (And, yes, he is American. Born in Africa, but he's lived here since childhood and is a product of the US running program).

The marathon is over, but I'm not done. I'm now alone in the middle of New York with nothing to my name but a singlet, shorts, racing flats, watch, and two packs of PowerGel. The NYRRC quickly augments that by giving me a finisher medal, space blanket and a food bag. As a devout Libertarian, I can't knock 'em for charging what the market will bear, but it does seem that a $160 entry fee should get you a little more than an apple, a bagel, and some almonds at the finish. Oh, well, there are plenty of street vendors willing to pick up the slack, but first I have to find my drop bag.

Something else you get for your money: a commemorative finish time magnet.

Remember those 67 UPS trucks? Well, they park 'em in reverse order. That's probably wise as the back of the field isn't going to want to walk a mile to get their stuff. But, it does mean that I have to walk a mile to get mine. This is why the Central Park finish is really a necessity. There simply isn't any other spot in Manhattan big enough to handle processing 43,475 finishers (not to mention those that dropped out, but still have to get their stuff). I had planned on walking for the better part of an hour, so this isn't really a big deal. After arriving at my truck, I get my bag. Knowing that runners will think nothing of changing in public, NYRRC has set up some tents to provide some modesty for the benefit of the less seasoned runners. I change into my warm-ups and head out to Central Park West at 82nd (right in front of the Museum of Natural History).

In another one of those things that you just never think about when running smaller races, the entire length of Central Park West is closed so the runners can meet their families, friends, etc. Every 50 feet or so is a sign indicating a range of 1000 numbers. Runners walk down the side of the street closest to the park, while those who are meeting them stand by the signs in the middle of the road. Having nobody to meet, I wander all the way down to 70th street before my feet start sending very clear signals that they are done for the day.

I descend to the subways, and get on the C train, which is jammed with runners who didn't think walking an extra mile was such a great idea and boarded at 82nd. I get off at 42nd to take the 7 train over to Grand Central. The 7 isn't very full; there's actually an open seat. I debate whether sitting down is a good idea (it might be hard to get back up), but decide I can probably sit for 10 minutes without stiffening up. What I can't do is stay awake. Almost as soon as I sit down, I pass out and don't wake up until I'm in Queens. I get back on the train heading the other way. It's an interesting change. Up until now, all the transportation (even the 5:30 from Norwalk) has been filled with runners. Now, it's the "usual" Sunday crowd from Queens which is anything but usual. At Grand Central, I have just enough time to buy some food and a beer for the train back to Norwalk.

Between the fans and the stellar race staff you could come away from this race thinking New York is the friendliest town in the world. You'd be wrong about that, but the impression is genuine. Many had told me that I should just run this one for the experience and save the performance goals for another day. I'm glad I ran it hard, but I'm also glad I stepped back just a bit to take it all in. There will be other marathons, but this really was a once-in-a-lifetime experience. And, we did raise a couple grand for ALS, so that's a good thing.

And, the running gods seemed to agree. Safely back in my little pond, Andy has decided to skip Bubba #4. With only six days rest in the legs, it's a bit too early to be going hard, but one does what one must and I manage to eek out an overall win. It's an improvement of 1673 places over the week before. It's all about finding the right pond.

Saturday, October 22, 2016

Variance 3.0 (continued)

I was not completely joking at the end of yesterday's post. We were having company over in the evening and I was supposed to be helping Kate get things ready. I incurred a bit (but not a lot) of wrath for finishing as much as I did under those circumstances.

Anyway, let's take a look at that integral.

$\sigma^2 = \frac{pb}{a+b}+\int\frac{a(2\lambda^{-2}-2\mu/\lambda+\mu^2)\lambda^{h-1}e^{-s\lambda}}{(a+b)s^{-h}\Gamma(h)}\textrm{d}\lambda$

That's a polynomial times an exponential, so we should be able to grind it out by repeated integration by parts. Before we do that, though, let's back it up one step.

$\begin{align*} \sigma^2 &= \int \frac{pb + q(\lambda)a}{a+b}f(\lambda)\textrm{d}\lambda\\ &=\frac{pb+a\int q(\lambda) f(\lambda)\textrm{d}\lambda}{a+b} \end{align*}$

What's the integrand? It's a function of a random variable times the density of the variable. The integral is over the entire domain of the random variable. That makes the integral the expectation of the function. Let's see where that takes us:

$\begin{align*} E(q(\lambda))&=E(2\lambda^{-2}-2\mu/\lambda+\mu^2)\\ &=2E(1/\lambda^2)-2\mu E(1/\lambda)+\mu^2\\ &=2E(\beta^2)-2\mu E(\beta)+\mu^2\qquad\textrm{where}~\beta=1/\lambda \end{align*}$

Now, since λ ~ Gamma(h,1/s), β ~ InvGamma(h,s). So,

$\begin{align*} E(\beta)&=\frac{s}{(h-1)}\qquad h>1\\ \textrm{and}\\ E(\beta^2)&=\textrm{Var}(\beta)+E(\beta)^2\\ &=\frac{s^2}{(h-1)(h-2)}+\frac{s^2}{(h-1)^2}\qquad h>2\\ &=\frac{s^2(2h-3)}{(h-1)^2(h-2)} \end{align*}$

The constraints on h are not to be taken lightly. The variance of the inverse gamma is not defined until you have at least three observations (if you actually crank through the integration by parts from the original equation, you'll find the same limitation comes into play since you have to repeat the parts twice to get rid of the squared term in the polynomial). These will have to be "provided" in the prior. While that means we will not be able to use a non-informative prior, that's actually a good thing. We really want to bias the variance high until we have significant data to the contrary. Anyway, we now have everything we need to compete the variance. It's a bit messy, but manageable.

$\begin{align*} \sigma^2&=\frac{\mu^2b+E(q(\lambda))a}{a+b}\\ \textrm{where}\\ E(q(\lambda))&=\frac{2s^2(2h-3)}{(h-1)^2(h-2)}-\frac{2s\mu}{(h-1)}+\mu^2 \end{align*}$

Friday, October 21, 2016

Variance 3.0

Shifting gears again back to thesis work.

Did I mention that, despite the fact it works OK, I'm not really happy with modeling the blocksums as U(0,ν) as I did last spring? It's not that the formula is ugly, it's that it's bogus. There's always a chance that a block will come back and every single row will be relevant. A small chance, for sure, but not zero. Uniform sets the probability to zero. So, I've been toying with modeling the blocksums as exponential. Here's what that does to the variance:

First, as with the moving upper limit uniform, the portion of the variance attributable to a block hit, q, is now a function of the parameter on the blocksum distribution. As before, that parameter is a single scalar. The exponential is usually parameterized with a "rate" parameter denoted, λ, that indicates how frequently things occur. This is the inverse of the mean. q(λ) turns out to be pretty trivial. Here, X' denotes the blocksum given that we have a hit; that is, X'~exp(λ).

q(λ) = E(|X'-u|²) = E((X')² - 2X'μ + μ²) = 2λ^-2 - 2μ/λ + μ²

We can pause here to note that we didn't even have to derive that much. We know the exponential distribution has a variance of λ^-2. As the mean minimizes the squared distance to the rest of the distribution, shifting that point will cause an increase. As this is a decomposition of squares, the Pythagorean theorem tells us that the new sum of squares must be the original sum of squares plus the squared distance that the center point was moved. Thus:

E(|X'-u|²) = Var(X') + (E(X')-μ)² = λ^-2 + (1/λ-μ)²

Which is just the first result, somewhat rearranged. That tidbit aside, the first form is easier to integrate with respect to λ, so we'll stick with that.

Now, to calculate the actual variance, we need a prior on λ. The gamma distribution is the natural conjugate prior, and it should work fine in this case. We've already used the usual parameter letters, so we'll switch to h and s to indicate the number of hits and the sum of those hits, which is the normal interpretation of the gamma parameters when used as an exponential prior (note that the sum gets inverted to create a rate for the Gamma). Thus:

$f(\lambda|h,1/s)=\frac{\lambda^{h-1}e^{-s\lambda}}{s^{-h}\Gamma(h)}$

So that just leaves us to calculate:

$\begin{align*} \sigma^2 &= \int\!\int \sigma(\lambda,\theta)f(\theta)f(\lambda)\textrm{d}\theta\textrm{d}\lambda\\ &=\int \frac{pb + q(\lambda)a}{a+b}f(\lambda)\textrm{d}\lambda\\ &=\frac{pb}{a+b}+\int\frac{a(2\lambda^{-2}-2\mu/\lambda+\mu^2)\lambda^{h-1}e^{-s\lambda}}{(a+b)s^{-h}\Gamma(h)}\textrm{d}\lambda\end{align*}$

Oh, heck, look at the time...

Thursday, October 20, 2016

Location and scale families

Location and scale families are distributions which are characterized by the location of their "center" (typically their mean, but it could be median or mode) and the scale of their spread. This essentially boils down to knowing their first two moments, though there are location and scale families where neither of those exist.

A location family is a distribution which is characterized by shifting the pdf by a constant. That is, if f(x) is a pdf or pmf of a real-valued random variable, then the set of pdfs {f(x-μ)} where μ is real is the location family for f. While this can be done for any pdf, location families are generally thought of as distributions where the location of the center is a direct function of one of its parameters.

A scale family works pretty much the same way, but here the transformation is {(1/σ)f(x/σ)} where σ is real. Again, you can pull this off with any pdf or pmf, since any function that integrates or sums to 1 will continue to do so under that transformation. Also again, one typically talks of these families when the scale parameter is a direct function of one of the distributions natural parameters. If the scale parameter, σ, is a direct parameter of the distribution, the distribution is typically parameterized such that σ is the standard deviation which, unlike variance, scales linearly. If the distribution is normally parameterized by its variance, it's generally written as σ² to highlight the fact that it is not, technically, a scale parameter, but the square of one.

If you've read this far, you're probably good enough to guess that a location-scale family is the set of pdfs: {(1/σ)f((x-μ)/σ)}.

Wednesday, October 19, 2016

Exponential families

Back to Q stuff.

An exponential family of distributions is a parameterized family with pdf or pmf f(x|theta) such that

$f(x|\theta)=h(x)c(\theta)exp\left(\sum w_i(\theta)t_i(x)\right)$

where h(x) ≥ 0 and t_i(x) are real-valued and do not depend on θ. Also, c(θ) > 0 and w_i(θ) are real-valued and do not depend on x. The parameter θ may be a scalar or vector.

Prominent continuous distributions in the exponential family include normal, gamma (including, obviously, exponential), and beta. Discrete examples are binomial, Poisson, and negative binomial.

Taxonomy isn't a big priority with mathematicians. The only reason to group things that conform to a model is that one can prove things about the model and then apply them to the elements of the model. Two such results:

$\textrm{E}\left(\sum\frac{\partial w_i(\theta))}{\partial \theta_j}t_i(X))\right)=-\frac{\partial}{\partial\theta_j}\log c(\theta)$

$\textrm{Var}\left(\sum\frac{\partial w_i(\theta))}{\partial \theta_j}t_i(X))\right) = -\frac{\partial^2}{\partial\theta_j^2}\log c(\theta) -\textrm{E}\left(\sum\frac{\partial w_i(\theta))}{\partial \theta_j}t_i(X))\right)$

Yeah, right, like I'm ever going to use those. Actually, they look worse than they are. Once you actually fill in the formulas, the partial derivatives are typically pretty easy and they allow you to compute the first two moments without summations or integrals. Higher moments follow a similar pattern.

A subset of the exponential families are the curved exponential distributions. These are exponential families where the parameter space theta is restricted. For example, the set of normal distributions having equal mean and variance, N(μ, μ). These are of interest because inferences about one of the parameters automatically tell you something about another, otherwise independent, parameter.

Tuesday, October 18, 2016

Less common distributions

Sorry I missed the off-day post; I know some of my readers rather like that one. However, there was a good reason for that: I didn't take an off day. Well, I sort of did; Saturday was pretty light. Sunday, I dug into my thesis research with a bunch more on Monday. I'm still finishing up the details on the paper I started last spring. In particular, I've been looking at distributions to model the sampling of block sums when the number of rows is a small fraction of the total rows in a block (but not zero).

In these cases, I need to model the block sum as something other than uniform, since lower values are much more likely than higher values. The obvious choice is exponential. The memoryless property is particularly relevant here: it doesn't matter how many hits I've found so far, the next row is just as likely (or not) to be included in the sum. The exponential distribution is well understood, but I need to model the parameter which determines the mean of the distribution. Since the algorithm is iterative, a conjugate prior is obviously desirable.

The conjugate prior for the parameter of the exponential (λ) is the Gamma distribution. That is, if the the prior on λ is Gamma(α, β), and n observations are made, the posterior is Gamma(α+n, β+ Σx). Basically, what we're saying is that we go into it thinking that α trials would result in a total of β. We simply add the new number of trials and the new total to that.

The next question is when do we abandon the uniform distribution for the exponential? (Recall that the uniform converged quite nicely for queries where large numbers of rows were relevant; we don't want to give that away.) To do that, we need to perform a likelihood test between uniform and exponential means. Of course, both will converge to normal (via the central limit theorem) given enough time, but it's not obvious how much time is required. So, we need precise distributions.

The mean of a series of uniform observations is a Bates distribution. The pdf is a bit unwieldy:

$f(x|n,b)=\sum_{k=0}^n(-1)^k\binom{n}{k}\left(\frac{x}{b}-\frac{k}{n}\right)^{n-1}\textrm{sign}\left(\frac{x}{b}-\frac{k}{n}\right)^{n-1}$

That doesn't make it wrong, but we'll have to look at ways to compute that quickly. Otherwise, we kill the whole point of this reduced sampling thing. At least that one is constant (b is just the upper bound of the stratum we're estimating and n is how many blocks we've sampled; the rest is closed-form).

On the exponential side, the mean is somewhat simpler. The sum of n observations is an Erlang distribution:
$f(x|n,\lambda)=\frac{\lambda^nx^{n-1}e^{-\lambda x}}{(n-1)!}$

The rub is that we now have to integrate that over our prior on λ. In short, lots of math still to come.

Saturday, October 15, 2016

Fat tails

Getting out of Q material and into the stuff I'm supposed to be researching this fall. Given my genetics, this is a rather fitting line of inquiry for me. My family isn't particularly portly overall, but we all have, shall we say, low centers of gravity. Especially when seated. I've passed this on to my daughter and she is not amused.

As mentioned yesterday, when higher moments don't exist, the mean of a series of independent, identically distributed (iid) random variables may not converge to a normal distribution. It will converge to something, though. And that something is generally what's called a stable distribution.

As these distributions generally don't have higher moments, the term stable may seem less than apt. It refers not to the fact that the means are predictable, but that the means (or any linear combination) of a (possibly infinite) series of random variables is a random variable of the same family.

For example, the normal distribution is a stable distribution because any linear combination of normal random variables is a normal random variable. The center and shape parameters may change, but the family remains the same.

The normal distribution is the only stable distribution to have all its moments. All the other stable distributions have fatter tails than normal and have an undefined variance. By the time the tails are as bloated as the Cauchy distribution, you've lost your mean as well. It actually gets worse than that; Cauchy is essentially the midpoint of the fat-tail spectrum. However, in real applications, the distribution is usually somewhere between Cauchy and Normal.

The sampling method I developed last spring (no, the paper is not done yet) adjusted for this by brute force. We simply stratified the data and made sure we completely sampled the highest strata. At that point, the remaining data is a finite-bounded sample and by definition has all its moments. Insofar as any database is really just a sample of a much larger set of observations that may or may not ever be recorded, this is obviously cheating (though it works!)

My next line of inquiry is: how do we really know when what we are looking at is a representative sample of this larger, theoretical population? To answer that question, I'm going to have to get a lot more familiar with the behavior of heavy-tailed distributions in general, and stable distributions in particular. Real math! Yay for that.

Friday, October 14, 2016

Normal

No, I didn't forget about the normal distribution. As I find the density function one of the more aesthetically pleasing mathematical pearls, I'll even typeset it in Latex:

$f(x)=\frac{1}{\sqrt{2\pi}\sigma}e^{-(x-\mu)/2\sigma^2}$

Books have been written about this distribution. But there are really just a few things you have to know:

It's called "normal" because it really does come up a lot. Why? Because of the Central Limit Theorem. That's a sufficiently important result to warrant it's own post, but the gist is that if you average together enough random variables, the result is normally distributed. There are some very important caveats, but it holds in the vast majority of cases.
Even when you're not averaging things, it's often a good fit. Symmetrical, single-mode, smooth, with long, but very light tails. That last bit means you can often model items with a finite range as normal because the part of the tails that is chopped off has almost no probability associated with it.
Even though the cumulative distribution function is intractable, it's relatively easy to work with. And, when you do need an actual number rather than a formula, pretty much every stats book has tables in the appendix you can use to look up a point on the cdf.
Not only does it have all it's moments, the first two are actually parameters. Mean: μ; Variance: σ².
"Normality" is preserved under linear combinations of normal random variables, and the mean and variance are pretty much what you would expect them to be (the means and standard deviations are simply the same linear combinations of the parameters μ and σ).

It all adds up to an incredibly versatile distribution. There's just one catch: sometimes it's completely wrong. And wrong in ways that aren't easy to see. The violation of the normality assumption can have catastrophic results on the validity of analysis. Much of my research is motivated by the fact that there really are data sets out there that do not converge to normal, no matter how much you sample them. While such atrocities will not likely surface on the Q, they show up in my work data all the time. I'll write more about them tomorrow.

Thursday, October 13, 2016

More continuous distributions

Beta: f(x|α,β) = x^α-1(1-x)^β-1 / B(α,β), 0 < x < 1, α > 0, β > 0

Intuitive description: This comes up a lot in Bayesian modeling, since it's a very convenient prior for a parameter that has a finite interval. The parameters give tremendous flexibility over the shape and it's a fairly easy distribution to compute. In many cases, it gives a conjugate prior (that is, the posterior distribution is of the same form).

Mean: α / (α + β)
Variance: αβ /[(α + β)²(α + β + 1)]

Double Exponential: f(x|μ,σ) = (1/2σ)e^-|x-μ|/σ

Intuitive description: Here, the absolute difference between the random variable and it's mean is exponentially distributed. The resulting distribution is symmetric, but with "heavy tails", that is, the distribution has less mass concentrated around the mean than the normal distribution. Unlike some heavy tailed, distributions, the double exponential does have all it's moments.

Mean: μ
Variance: 2σ²

Cauchy: f(x|θ) = 1/[π(1 + (x - theta)²)]

Intuitive description: The king of the heavies. This distribution is spread so wide it has no moments, not even a mean. While largely theoretical (the lack of moments makes it useful as an edge case when testing ideas), it does come up in real life: the quotient of two standard normals is Cauchy.

Mean: none!
Variance: none!

Wednesday, October 12, 2016

Common continuous distributions

Moving on to the common continuous families.

Uniform: f(x|a, b) = 1/(a - b) x in [a, b]

Intuitive description: The analogous to the discrete uniform, the random variable is equally likely to be anywhere in the range.

Mean: (a + b)/2
Variance: (b - a)²/12

Gamma: f(x) = x^α-1e^-x / Γ(α), x>0 or, more commonly f(x|α,β) = x^α-1e^-x/β / Γ(α)β^α

Intuitive description: The second parameterization is a bit easier to grasp (and, thus, more commonly used). The first parameter, α is a shape parameter which indicates the height of the peak. The second, β, determines the spread. Either way, it's a right-tailed distribution with a peak at (α-1)/β. The distribution is frequently used in modeling processes (known, shockingly, as gamma processes) where there is a varying degree of "memory" in the process. In the extreme case, the gamma yields the exponential, which is a memoryless distribution. The gamma also comes up in Bayesian stats a lot because it has nice conjugate prior properties (though that's less important now than it was 30 years ago, since current Bayesian work favors numerical, rather than analytic techniques.

Mean: α/β
Variance: α/β²

Chi-Squared: This is a gamma distribution with α = p/2 and β = 2. Here, p is a positive integer and is referred to as the "degrees of freedom".

Intuitive description: This represents the sum of squares from a sample of p normal random variables. As such, it comes up a lot in regression analysis when assessing goodness of fit.

Mean: p
Variance: 2p

Exponential: This is a gamma distribution with α = 1. The spread is typically inverted to give a rate. Thus, f(x|λ) = λe^-λx.

Intuitive description: This is the memoryless version of the gamma. That is P(x>a|x>b) = P(x>a-b).

Mean: 1/λ
Variance: 1/λ²

Weibull: f(x|γ,β) = (γ/β)x^γ-1e^{-x^γ/β}; x > 0, γ > 0, β > 0

Intuitive description: Can either be viewed as a general case of the exponential or a transformation of the exponential. If X ~ Exponential(λ) then X^1/γ ~ Weibull(γ,λ). Either way, it's a failure rate distribution.

Mean: γ Γ(1+1/β)
Variance: γ²[Γ(1+2/β) - Γ(1+1/β)²] (I have no intention of memorizing that)

Lots more to come, but that will do for today.

Tuesday, October 11, 2016

QA

Yesterday, I mentioned that using sample defect rates to maintain quality was a dubious practice. That assertion is enough at odds with conventional wisdom that it needs some arguments to back it up. First, let me be clear. I'm not saying that one shouldn't look for defects. Nor am I claiming that keeping statistics on defects is a bad thing. I just think the practice of looking at only samples is flawed. Here's why:

Unless your defect rate is absurdly high, it's very difficult to get enough power in your test. Power is the probability that your test finds a problem when there is one. Frequentist tests typically control the converse, they want to avoid finding problems when there isn't one. By capping the probability of that happening, you either need a really big sample or you just have to accept that you're going to miss a lot of problems. An example: suppose you are producing 1000 cell phone batteries per hour. You decide to sample 20 each hour to see if they are prone to catching on fire. So over the course of an 8 hour shift you've checked 160 batteries. You certainly don't want to throw away 8000 batteries for nothing, so you will only reject the batch if the data suggests the failure rate is at least 0.05%. That equates to 4 expected failures in the entire batch. But the, number of expected failures in your sample is only 0.8. Meaning, more than likely, you don't catch any, even if the rate is at the level you are trying to avoid. Ask Samsung how great it is to release 4 exploding batteries to the market every day. That's a pattern the news media will notice. So, you sample a lot more batteries to improve your power, but that drives up your cost. So, you try to find the least you can sample and still not send four bad ones out the door every day. Good luck with that. The defect ratio is simply too low for a meaningful sample test. And, no, Bayesian tests don't work any better. The problem is the expense of the sample, not the methodology of the test.
Sample testing focuses on the product, not the process. But defects are almost always failings of process. The time to test the crap out of things is before you start producing them in huge quantities. You set up your line to produce batteries and you test every single one. Now, you will catch the all bad ones (or, at least almost all, assuming your test is any good) and you can decide if your defect rate is acceptable because you have a real rate, not a shaky estimate of the rate. Once in production, you still do checks, but they are checks against the process. Is the ratio of chemicals in the battery the same as what you had in pre-production? Are the seals as good? Is the heat dissipating at the exact same rate. You're not looking for defects, but differences. If there are differences (a much easier thing to detect), that means that you're doing something differently in production than you did in pre-prod. It also means that your pre-prod defect rate no longer applies. Fix the process and fix the problem.
Any fixed test will miss the stuff you didn't think about. My favorite example of this comes from personal experience. I was working on a material requirements planning system for a pet food company, so I was spending some time at one of their plants. One day, they were making dry adult dog food. As that's their biggest product, they had all eight extruder banks running it; 32 tons per hour. Every hour, they took their samples back to the lab and made sure that the mix was what they were expecting (properly addressing #2, above). It was. At the end of the day, a fork lift driver goofed and pierced a bag of the stuff with the lift. It didn't look right to him. He took a closer look. The bag said dog food, but the chunks were clearly the tiny kibbles for kittens. Somebody had put the wrong die caps on all the extruders. Two Hundred Fifty tons of unsellable dog food. The lab technicians didn't catch it because they didn't even think to check. They just ground up the sample, ran it through their analyzer, and confirmed the composition. Size and shape didn't even come into consideration.

Again, my point isn't that sample testing is is bad; just woefully insufficient.

Monday, October 10, 2016

Common discrete distributions

Back on to stats. Today, we'll look at some essential discrete distributions. As these are really families of distributions, I'll include the parameters of the families in the functions describing them.

Discrete Uniform: P(X = x|N ) = 1/N x = 1, 2, ..., N

Intuitive description: X is equally likely to be any one of N values. Obviously, any finite set where the outcomes are equally likely can be mapped to a discrete uniform distribution. If the mapping is a linear function, then the mean and variance can be easily transformed. If it's non-numeric data, then only the pdf has much meaning.

Mean: (N + 1)/2
Variance: (N + 1)(N - 1)/12

Hypergeometric:
$P(X=x|N,M,K)=\frac{\binom{M}{x}\binom{N-M}{K-x}}{\binom{N}{K}}\qquad x=0,1,...,K;$

also note that implicit in the definition is that M ≥ x and N-M ≥ K-x.

Intuitive description: Sampling without replacement. Using the traditional "Urn" parlance, what is the chance of pulling x Red balls in K tries from an Urn filled with N balls, M of which are Red (without putting any back between tries). This comes up in acceptance testing, where you are trying to determine if a batch of items meets a certain quality level. Let me just state from experience that the defect rate of a small sample is an absolutely bogus way to test quality. Maybe I'll elaborate tomorrow.

Mean: KM / N (While this is intuitively obvious, it is non-trivial to prove; in general, discrete distributions are a PITA).
Variance: an even bigger PITA: (KM/N)((N-M)(N-K)/(N(N-1))

Binomial:
$P(X=x|n,p)=\binom{n}{x}p^x(1-p)^{n-x}\qquad x=0,1,...,n$

Intuitive description: The number of successes in n Bernoulli trials with success probability p. This one comes up a lot.

Mean: np
Variance: np(1 - p)

Side note (Binomial Theorem): For any real numbers x and y and integer n ≥ 0:

$(x+y)^n=\sum_{i=0}^{n}\binom{n}{i}x^iy^{n-i}$

Just one of those things to know.

Poisson:
$P(X=x|\lambda)=\frac{e^{-\lambda}\lambda^x}{x!}\qquad x=0,1,...$

Intuitive description: The number of occurrences of independent events in a unit interval where the rate is λ. For example, the number of people entering a checkout queue in a given period of time. The important thing is that the independence; that is, the waiting time to the next event has nothing to do with when the last event happened. The "sister" distribution is the continuous exponential which gives the waiting time between two such events.

Mean: λ
Variance: λ

Negative Binomial:
$P(X=x|r,p)=\binom{x-1}{r-1}p^r(1-p)^{x-r}\qquad x=r,r+1,...$

Intuitive description: Number of tries needed to get r successful Bernoulli trials with success probability p. This one also comes up a lot and is often used in stopping rules, as in, "if we haven't seen enough success by now, we can give up because the rate is obviously not as good as our threshold." The Negative Binomial is also often stated as the number of failures prior to the rth success. In this parameterization, the pdf becomes:

$P(Y=y|r,p)=\binom{r+y-1}{y}p^r(1-p)^{y}\qquad y=0,1,...$

While these formulations are equivalent, it's important to know which one you're using when recalling the mean. It should be obvious that the total number of trials is just the expected number of failures plus r and that the variance is the same either way.

Mean: r(1-p)/p (failures); r + r(1-p)/p = r/p (total).
Variance: r(1-p)/p²

An important relationship to be aware of is that the limit of the Negative Binomial (parameterized as failures) as r goes to infinity and p goes to 1 such that r(1-p) goes to λ is the Poisson.

Geometric: P(X=x|p) = p(1-p)^x-1 x = 1,2,...

Intuitive description: This is a special case of the negative binomial which is the waiting time for the first success to occur. It is the discrete version of the Exponential, which is the waiting time for a Poisson event. As such, it shares the Exponentials "memoryless" property, that is, how long you should expect to wait has nothing to do with how long you've already waited. Or, more precisely, P(X>s|X>t) = P(X>s-t) for all integers s > t.

Mean: 1/p
Variance: (1-p)/p²

Sunday, October 9, 2016

Stewardship

I got to talk in church today! Here's what I said.

I was a little surprised I was asked to speak about stewardship. This late in life grad school thing has forced me to cut back on a lot of the things I used to do around here. Honestly, I wish I could be more help. Turns out, I wasn’t really asked. Kate volunteered me. As she does a ton of stuff for this church, I’ll let you talk to her about the many ways you might lend a hand. I’ll restrict my comments to the area we hate to talk about: money.

I’m probably risking losing my audience in the first 30 seconds by quoting a judgmental scripture from the Old Testament, but here’s one from Malachi:

Will a man rob God? But you ask “How do we rob you?” In tithes and offerings. Bring the whole tithe into the storehouse, that there may be food in my house. Test me in this and see if I don’t throw open the floodgates of heaven and pour out so much blessing that you will not have room enough for it.

Now, I believe that Christ has redeemed all my sins: past, present, and future. But, I’m still not going to go so far out on that limb to lie to you from the pulpit and say that I have been 100% faithful in this. But I have been faithful enough to Malachi speaks the truth.

You might think that I am preaching a prosperity gospel: Give to God and get back more! I am not. One of the better uses of my younger years was some missionary work. I saw people, who I would describe as destitute, bringing their tithe to the Lord. They were not blessed with prosperity, and they never would be. But every Sunday they would arrive at 8 and not leave until well into the afternoon. They would sing, shout, and even scream witness to the abundance of their spirit (it was quite exciting). Like Jeremiah, they could not choose to be silent. It was a fire in their bones.

Last week, a co-worker who knew that I was an independent contractor asked me about the tax code for S-Corporations. Apparently, there had been a news story about some rich guy who bragged he hadn’t paid taxes because he was smart. (That joke was better a few days ago; now it’s a little sad.) I told him that if my company lost 900 million dollars, paying my taxes would be the least of my problems, but we did talk about some of the legitimate things you can do through an S-corp to lower your exposure. We also talked about some of the more questionable practices. He asked how far into the grey area of the tax code I was willing to go and I said, “not very.”

This is as much pragmatism as it is virtue; I just don’t need the hassle. By sticking to standard accounting practices, I reduce the chance of a long and costly audit.

God’s “tax code” is a lot simpler than the IRS’s. You can find the complete instructions in the 27^th chapter of the third book of the Law. Really. Furthermore, unless you plan on dedicating one of your barnyard animals to the Lord or want to buy back your offering (even Leviticus has loopholes), the code comes down to a single verse; a sub-paragraph in the IRS parlance. One tenth. Period. The ultimate flat tax.

Perhaps it’s the simplicity that leads us to not take it so seriously. If God really meant this, there’d be forms and tables and paid preparers to make sure you got your best possible refund.

But, no. Apparently, this one was sufficiently important that God ditched the usual allegory and vague symbolic references. If you have ten fingers, you can figure it out. 1040 for Dummies.

Unlike so many Old Testament teachings, Christ didn’t seem to think this one needed a lot of elaboration. I leave you with the entirety of his commentary on it: “Give to Caesar what is Caesar’s and give to God what is God’s.”

Saturday, October 8, 2016

Cauchy product

We'll wrap up this week with one more Analysis post before switching back to Stats next week.

A Cauchy Product of two series Σa_n, and Σb_n is the series Σc_n where

c_n = a₀b_n + a₁b_n-1 + ...+ a_nb₀

I'll be pretty surprised if this turns up on the Q, but there is one important result that comes out of this: if the two series are absolutely convergent, then the Cauchy product series is also absolutely convergent.