Saturday, March 4, 2017

Data done

Finally finished the data layer for Blocker. It shouldn't have taken so long and I'm not sure what I was doing wrong. Anyway, it's done now; all unit tests pass. Granted, it's not thread-safe, so I can't run in parallel on Hadoop. Also, the block caching is, shall we say, naive. It works, but it's certainly not a good caching strategy. Fortunately, neither of those two things have any impact on whether this is a good term project, so I'll defer them until later.

Assuming I can get the data actually loaded tomorrow (reasonable, the ingestion routine very similar to what I used for CISS and it's the same data set) and that I can get the query engine working this week (also reasonable, though it's probably 15-20 hours of work), I'll have four weeks to actually develop the blocking algorithm and then another two weeks to produce results and write it all up.

Doable, but tight for sure.

No comments:

Post a Comment