Wednesday, March 15, 2017

Evolutionary Programming Cheat Sheet (part 2)

Second part of the notes for tomorrow's test.

Quantum GA

The trick here is that we're using "quantum bits" (hereafter referred to as Qbits). A Qbit, like a quantum particle doesn't actually have a value. You can observe it at thereby force it to a value but, left alone, it merely has a distribution of where it might actually be.

If and when quantum computers come to be, this will be extremely powerful for certain types of programs. It allows a register to contain not just a value, but an entire distribution. Obviously, if you can represent the distribution with a single register, there's no need to have some big population sample that you are pulling from.

Crossover and mutation can be done in the usual way, through crossover is often skipped in favor of just mutating the solution based on its fitness relative to the the current best answer.

The catch, of course, is that we don't actually have a quantum computer just yet. So, we have to simulate it. The standard way to do this is to keep a value alpha which represents the square root of the probability that a bit is zero. You can use imaginary coefficients, but it doesn't really buy you anything in this space so, sticking to reals, alpha (and it's corresponding beta where alpha2+beta2=1) define points on the unit circle. You then force an observation from the register; making all the Qbits go to 0 or 1 according to their probabilities, assess the fitness of the sampled result, and then rotate the point along the unit circle towards the best observation seen so far. (Implementation detail: I would think just storing the angle of the point unit circle would be more efficient, since you're going to have to convert it to that to do the rotation anyway.)

While this works, it's a lot of work simulating the Qbits and I'm not convinced that assessing the fitness of one observation from a distribution is any more efficient than assessing a population pulled from the same distribution. Anyway, it's a concept that really can't be evaluated until there's a legit hardware implementation. Until then, it's just another stochastic algorithm which hides the pseudo-randomness in the simulator.

Differential Evolution

The big idea here is to modify solutions by applying mutations of their neighbors.

For each solution (target vector), three other solutions are selected to form a donor vector. This vector is of the form: vi,G+1 = xr1,G + F(xr2,G - xr3,G). F is a constant, not a function.

The donor vector is then recombined with the target vector to form a trial vector. The formula for this is contrived, to say the least. Suffice it to say that one gene is picked at random to surely come from the donor vector (thus insuring a mutation) and all the others are picked based on a random value exceeding a preset threshold. Yes, really.

The trial vector is included in the next generation if its fitness exceeds that of the target vector. Otherwise, the target vector survives.

And that's basically it. Aside from a really contrived generation step, it looks an awful lot like Evolutionary Programming (below).

Genetic Programming

Like Genetic Algorithms except that instead of having generations of solutions, you have generations of programs. That is, the actual algorithm changes from one generation to the next. The most common way to do this is to represent the program as a decision tree and then modify the tree from one generation to the next. So, the basic algorithm doesn't really change, but the actual flow of the algorithm is altered by the tree, which is ancillary to the actual solution.

Crossover is generally applied by crossing sub-trees. While this works on simple categorization problems, it is limited in that the context of the sub-tree might be very important. For example, in my problem, applying this blindly to the block assigner could result in assignment strategies where the sub-tree contained no viable nodes for a block passed to it because that subtree was predicated on the conditions further up the tree being met. I think there are ways around this, but the papers we were given don't really address it. If I'm going to use this, I'll need to figure it out.

Areas where GP does well:
  • Interrelationship between input variables is poorly understood.
  • Finding shape and size of the solution is a big part of the problem.
  • Search domain is large and solutions are sparse.
  • It's very hard to directly obtain solutions, but easy to simulate performance on tentative solutions.
Fair enough. Footnote: The AI-Ratio, that is, the amount of "intelligence" built into the algorithm via artificial means versus the amount that a human has to add to even get the thing to work is very high. That is, you don't need a whole lot of domain knowledge to generate a viable algorithm.

Evolution Strategies

Evolution Strategies is similar to GA, but relies more heavily (or exclusively) on the mutation step. Mutation in the real-valued space is done by adding a Normal random variable to each current gene. Elitism can be used to ensure that good solutions don't drift too far afield.

The interesting part of ES (at least for me) is how the variance of the perturbing variable is handled. There are several options:
  • Use the same variance throughout.
  • Use a variance that decreases over time.
  • Use a variance that is increased if few improvements are noted, decreased if lots of improvements are noted.
  • Build the variance into the mutation (this is the one that intrigues me). That is, randomly generate the variance first, then generate the new solution. If the new solution is accepted, you take the variance with it. Thus, when the solutions require large variances to get off local maxima, you get those. Once all the improvements are coming from the local space, the variance gets clamped down.
 It is also possible to correlate the mutations. That is, it may be that a change in one attribute only helps if another changes with it. In that case, one should generate a correlated mutation vector.

The most successful application of this is the CMA-ES (Correlation Matrix Algorithm - Evolutionary Strategy) algorithm.

The selection step may or may not include the parent generation, μ.
  • (μ,λ)-selection: use the children only and select the best (cut back to just μ).
  • (μ+λ)-selection: keep the parents in the mix and then select.

Evolutionary Programming

Similar to GA, but rather than trying to code the genetics that produce a given behavior, it attempts to code the behavior directly (phenotype versus genotype). As such, there is no use for a crossover operator - EP uses only mutation.

The basic algorithm applies a mutation to each member of the current generation and then selects the best from the current and offspring to create the next generation.

Another novelty (though it's really just a generalization of tournament selection) is that individuals are not selected based on their absolute fitness, but by how well they stack up against a randomly selected "peer group". I'm failing to see the upside in that, though I'll concede that, in some domains, it might increase diversity. One might note that this implements a de facto elitist strategy since the best known solution will always beat any group of peers.

Mutation is driven by random perturbations based on a distribution and step size which are considered inputs. The step size (variance) may be variable over time. As with ES, the step size may be fixed (non-adaptive), vary based on some function of time and convergence (adaptive), or be part of the mutation itself (self-adaptive).

Correlations between components are allowed; necessitating a correlated mutation vector.

Among the various mutation distributions cited is an odd one: Mean Mutation Operator. It's the sum of a standard Normal and standard Cauchy random variable. The claim is that this produces more large and small perturbations and less in the middle. I don't dispute that, but I'm not sure what the mathematical basis for using such an operator would be. I think it falls under the heading of "making shit up and finding out that it works reasonably well." Hey, that's how Tungsten was invented; light bulbs are good.


No comments:

Post a Comment