Sunday, January 17, 2016

Algorithms Summary

CS 5130 - Advanced Data Structures and Algorithms

Course Outline (Chapters 1-4, 6-8, 15-16, 22-25, 29, 34, and 35 in text Cormen et al., Introduction to Algorithms, 3ed):
  1. Growth functions, Divide and conquer, Recurrences, and Asymptotic bounds.
  2. Sorting algorithms
  3. Greedy Algorithms
  4. Dynamic Programming
  5. Graph Representations
  6. Depth and Breadth first searches
  7. Minimum Spanning Trees
  8. Shortest Path
  9. NP-Completeness
  10. Approximation algorithms
  11. Linear Programming
Key Takeaways:

Recurrence trees are stupid. Unless the recurrence is so trivial that you don't need any help solving it, the tree is difficult to construct, even more difficult to verify, and ends up giving you a sloppy loose bound. Prof. Pan would probably very much object to that assessment, but on the homework I was able to produce and prove tighter bounds using other methods than I (or he) could using trees. Then I had to spend an inordinate amount of time trying to construct a tree to support the bound I had already proven correct. CPU time just isn't that precious anymore - write the recurrence in some language that supports list arithmetic (so your computation doesn't cause arithmetic overflow; you often have to go well beyond reasonable size values to get the true asymptotic behavior) and crank the actual numbers. Then plot it against various curves to find a tight bound. Takes less time and gives better results.

That said, I really need to shore up my Calculus. It's very rusty and that hurt me not just with recurrences, but with bounds in general. Think Calc might come up on a qualifying exam for math? Project for the summer, I guess.

Kinda odd, given the course title, that the two sections of the book that we skipped completely were "III - Data Structures" and "V - Advanced Data Structures". I skimmed through them and there's a ton of good stuff in there. This text is a keeper.

I think that graph algorithms in general are a very fertile field for probabilistic algorithms. Might talk to Prof. Pan about that since that's pretty much his thing.

The NP-Completeness stuff seemed very natural to me. A lot of the students seemed to struggle with the concept of reductions. It's a technique I've always liked. I remember being given an open problem in Simulation at Cornell where the Prof (Lee Schruben, yes, really) asked if a specification method was sufficient for any kind of simulation. As we generally did in such situations, most of the students gave it a go and when nothing obvious presented itself, we gave up. It was, after all, a problem that Schruben himself conceded was "probably true, but might be really hard to prove." Dave Tate was a bit more tenacious and showed that you could use the method to specify the simulation of a Turing Machine and since anything that can be simulated can be simulated on a Turing Machine, the method must be complete. It was brilliant (Schruben agreed) and I resolved to become a master of restating problems. I don't know that I've fully achieved that, but I love the technique and use it any time I can.


Stuff from class that's probably worth remembering:
  • Bounds
    • Loose upper: o(f)
    • Upper: O(f)
    • Tight: Θ(f)
    • Lower: Ω(f)
    • Loose lower: ω(f)
  • Bounds can be demonstrated by definition or by taking the limit of the ratio of the bound and bounded function as the argument goes to infinity. If it converges to a constant, the bound is tight. If it goes to infinity or zero, it is loose. If it does not converge, the bound is invalid.
  • Time and space bounds of exponential order or greater are considered intractable, even though small and/or specialized cases may be solvable. Conversely, polynomially bounded problems are considered solvable, even if the order of the polynomial is too high for practical computation.
  • If an upper bound for a recurrence is leaving an annoying low order term during an induction proof, try SUBTRACTING that term from the bound. Counterintuitive, since you're actually tightening things, but by putting the tighter bound back into the recurrence, you get less slop after sending it through the inductive step.
  • "Master method" for recurrences (not really broad enough to be considered a master method in my mind, but that's what it's called and it is often useful). Works for T(n) = aT(n/b) + f(n), a≥1, b>1, f(n) asymptotically positive plus a few more caveats.
  • All compare and exchange sorts are Ω(n2). All sorts on unbounded cardinality are Ω(n log n). Linear-time sorting is possible with restrictions on keys.
  • Optimal Sub Structure (OSS) for Dynamic and Greedy algorithms: Optimally solving subproblems yields global optimums.
  • Dynamic Programming works when the subproblems are Independent, Overlapping (used more than once by the global solution), and exhibit OSS.
  • Greedy Choice Property - there exists a globally optimal solution that incorporates the locally optimal choice. This and OSS enables greedy algorithms to always yield optimal results.
  • Greedy algorithms often return "good enough" results even when there are exceptions to the Greedy Choice Property.
  • Memoization (which is very hard to say without sounding like you have a speech impediment): leaving breadcrumbs along the way so you can reconstruct the solution when you're done.
  • Graph representations: Adjacency Lists and Adjacency Matrix. Lists are smaller and faster for sparse graphs. External note from the creators of HP Vertica, which have been spending some time with us at work: in the Big Data space, Adjacency Matrices are the way to go, since trading space for time is almost always a good thing.
  • All the basic graph algorithms (depth first search, breadth first search, minimal spanning tree, shortest path, etc) come down to how fast you can recognize a connected set of nodes. Several interesting data structures for managing sets are helpful in such operations. Notably, Fibonacci Heaps.
  • Many graph problems can be restated as linear systems and solved with Linear Programming solutions. This was very much news to me, and surprising news at that. I'm sure it got mentioned in one of my LP classes at Cornell, but I missed it.
  • Standard form for LP problems: minimize cTx subject to Ax ≤ b, x ≥ 0.
  • Slack form for LP problems: s = b - Ax, s > 0. This form leads to the basic (on the left) and non-basic (on the right) variables which get pivoted in the Simplex method.
  • NP does NOT mean Non Polynomial. It means Nondeterministic Polynomial. I already knew that, but it's the sort of thing you want at the front of your brain because otherwise you might blurt out the wrong answer while making a bad joke about a hard problem after a couple of beers at happy hour and immediately be classed as The Guy Who Thinks He's Smart But Doesn't Know Shit.
  • Any NP-Complete problem can be stated as any other NP-Complete problem. Obviously, some of the reductions are more straightforward than others.
  • When showing NP-Completeness, it is OK to restate an optimization problem as a decision problem. That is, if the decision problem (is there a solution bigger than x?) is NPC, the optimization problem (what's the best possible solution) is also intractable (assuming P <> NP; if it turns out that P = NP, then the whole conversation is moot).
  • Approximation schemes come in several grades:
    • approximable (APX): for fixed e>0 the scheme is polynomial in n
    • Polynomial-Time (PTAS): for any e>0, the scheme is polynomial in n.
    • fully Polynomial-Time: polynomial in both n and 1/e.

No comments:

Post a Comment