Pardon the metaphor, but I do think this is the easiest way to think about how my evolutionary algorithm will work. I bring Easter into the mix because the evolution will be a balance of two strategies: Purification and Evangelization. (Most of the class is Hindu, so it remains to be seen how well this analogy works for them).
The basic strategy is to select some progenitor blocks and rank them by fitness (as measured by the last set of query runs). The remaining blocks need to have their rows allocated to progenitor blocks. The BlockAssigner does this. The algorithm for the BlockAssigner is as follows:
Try each block in order of fitness. On the first row where the criteria for the block accepts the criteria for the row, assign the row to that block.
Easy stuff, but what if the block is getting too big? That's when we have a decision. We can either purify the block or evangelize it. Purification means finding a subset within the block that meets even more stringent criteria for inclusion. For example, suppose a block is entirely composed of records from business unit 17006. Suppose also that 80% of those records are also from the Product Line of Whole Life Insurance. We can purify the block by adding Whole Life to the criteria for the block. The remaining block will now be even more specific and more likely to be excluded from queries where none of the rows are relevant.
What about the rest of the rows? They get thrown back into the mix of rows to be re-assigned to other blocks. Obviously, you only want to take this strategy when the purification is meaningful. That is, the new attribute is high on the list of relevant attributes AND most of the block conforms to a small number of values for that attribute.
Suppose not. We've already got a good block on our hands (since it was selected as a progenitor), so the next best thing would be to spread that around. We look for a way to split the block such that we wind up with two relatively equal sized blocks based on either a new attribute or by sorting out values from an existing one (for example, if the block had rows from business units 17001, 17005, and 17006, we could put 17001 and 17002 in one and 17006 in another).
My gut feel is that it makes more sense to bias the algorithm towards purification. That is, if there's a reasonably good way to increase the criteria, do it. Evangelization will be a natural outgrowth of blocks that are already purified to the point where further refinement doesn't gain much. The idea is actually quite consistent with the Easter message. First, get your own act together, then spread the message around.
Finally, all this assumes the existence of a "catch-all" block for rows that don't meet the criteria for any other block. This block will also be split when it gets too big with a very heavy bias towards purification (leaving the remaining rows in a new catch-all block),
No comments:
Post a Comment