never2old4school: QA

Yesterday, I mentioned that using sample defect rates to maintain quality was a dubious practice. That assertion is enough at odds with conventional wisdom that it needs some arguments to back it up. First, let me be clear. I'm not saying that one shouldn't look for defects. Nor am I claiming that keeping statistics on defects is a bad thing. I just think the practice of looking at only samples is flawed. Here's why:

Unless your defect rate is absurdly high, it's very difficult to get enough power in your test. Power is the probability that your test finds a problem when there is one. Frequentist tests typically control the converse, they want to avoid finding problems when there isn't one. By capping the probability of that happening, you either need a really big sample or you just have to accept that you're going to miss a lot of problems. An example: suppose you are producing 1000 cell phone batteries per hour. You decide to sample 20 each hour to see if they are prone to catching on fire. So over the course of an 8 hour shift you've checked 160 batteries. You certainly don't want to throw away 8000 batteries for nothing, so you will only reject the batch if the data suggests the failure rate is at least 0.05%. That equates to 4 expected failures in the entire batch. But the, number of expected failures in your sample is only 0.8. Meaning, more than likely, you don't catch any, even if the rate is at the level you are trying to avoid. Ask Samsung how great it is to release 4 exploding batteries to the market every day. That's a pattern the news media will notice. So, you sample a lot more batteries to improve your power, but that drives up your cost. So, you try to find the least you can sample and still not send four bad ones out the door every day. Good luck with that. The defect ratio is simply too low for a meaningful sample test. And, no, Bayesian tests don't work any better. The problem is the expense of the sample, not the methodology of the test.
Sample testing focuses on the product, not the process. But defects are almost always failings of process. The time to test the crap out of things is before you start producing them in huge quantities. You set up your line to produce batteries and you test every single one. Now, you will catch the all bad ones (or, at least almost all, assuming your test is any good) and you can decide if your defect rate is acceptable because you have a real rate, not a shaky estimate of the rate. Once in production, you still do checks, but they are checks against the process. Is the ratio of chemicals in the battery the same as what you had in pre-production? Are the seals as good? Is the heat dissipating at the exact same rate. You're not looking for defects, but differences. If there are differences (a much easier thing to detect), that means that you're doing something differently in production than you did in pre-prod. It also means that your pre-prod defect rate no longer applies. Fix the process and fix the problem.
Any fixed test will miss the stuff you didn't think about. My favorite example of this comes from personal experience. I was working on a material requirements planning system for a pet food company, so I was spending some time at one of their plants. One day, they were making dry adult dog food. As that's their biggest product, they had all eight extruder banks running it; 32 tons per hour. Every hour, they took their samples back to the lab and made sure that the mix was what they were expecting (properly addressing #2, above). It was. At the end of the day, a fork lift driver goofed and pierced a bag of the stuff with the lift. It didn't look right to him. He took a closer look. The bag said dog food, but the chunks were clearly the tiny kibbles for kittens. Somebody had put the wrong die caps on all the extruders. Two Hundred Fifty tons of unsellable dog food. The lab technicians didn't catch it because they didn't even think to check. They just ground up the sample, ran it through their analyzer, and confirmed the composition. Size and shape didn't even come into consideration.

Again, my point isn't that sample testing is is bad; just woefully insufficient.

never2old4school

Tuesday, October 11, 2016

QA

No comments:

Post a Comment