Tuesday, November 7, 2017

In other news...

I do have a day job. And, while it's consuming way more of my time than I'd like, at least I can claim some success. Here are some stats from our new reporting platform compared to last year:

Data size:

  • 2016: 3 Billion rows, split among three cubes.
  • 2017: 3.8 Billion rows, all consolidated in one cube.
Median batch load time (for week before recalc; which is the busiest week):
  • 2016: 47 minutes (review only)
  • 2017: 22 minutes (all)
Last year, the only way we could keep the "official" cube stable was to throttle the builds to once every three hours. So, you might have to wait a long time to see results there. We had a "review" cube that allowed you to see what your particular batch was going to look like that refreshed more often.

Benchmark query time:
  • 2016: 14 seconds with high variability (it was common for the query to take over a minute)
  • 2017: 5 seconds, reasonably consistent
So, we've hit our marks. But, it's come at a cost, both for RGA and for me. The project wasn't cheap and all the overtime I've put in could have gone to working on my research.

Still, it's an easier pill to swallow when the results are so obvious.

No comments:

Post a Comment