I'm not sure why I didn't originally write my CISS algorithm in Java. I guess it was just expediency. I needed to do it quick and I can program faster in C#. More properly, I can program faster in Visual Studio. As I mentioned last year when I was taking my languages class, the programming environment has a lot more to do with productivity than the language. It's not that IntelliJ is bad, I just don't know it real well (OK, I hardly know it at all.) So, while I'm happy that Java now has a development environment to rival Visual Studio, until I actually use it enough that I know all the shortcut keys and navigation, it's still going to slow me down.
Anyway, that's my problem and one I need to fix since we're going to be doing a lot more development in Java at work. The stuff I'm writing for school is sufficiently trivial that I can poke my way through it. The code base at work is substantially more complex.
Incidentally, if you're a Java zealot who is reading this and celebrating that another Fortune 500 has left the evil empire and embraced Java, you couldn't be more off-base. My particular group needs to start using Java because that's the native language of Hadoop. I expect this is a temporary situation; there's no reason the .net framework can't run efficiently on Hadoop. However, since the vast majority of Hadoop developers have Java, it makes sense to stick with that for now if only to make staffing easier. Meanwhile, the organization as a whole is still predominantly a .net shop.
Which, of course, brings us to the crux of the matter: it doesn't matter. In the world where people evaluate your performance by how well your code works and how much they had to pay you to write it, there is no place for religious crusades for language purity. Almost all large organizations support a variety of development tools. You use what works. If that means you have to learn something new that isn't implemented quite the way you would like or (even less relevant) isn't implemented by a company you like, tough.
So, this weekend, I'm re-writing CISS in Java so I can run it on Hadoop (and, yes, to make the source code more palatable to academics who would regard the previous paragraph as heresy). That's not really a big deal; the only remotely complicated part of the program is the data layer and even that is pretty straightforward. I haven't yet decided if I want to write a parallel version of it. Obviously, that would be the way to go if this algorithm was ever going to be implemented in production, but it might be time to move on to the next thing rather than optimizing an algorithm that was really only created to demonstrate the larger point that random sampling of financial data is problematic.
No comments:
Post a Comment