09 Jul: Back in time: unreliable clocks and distributed computing

Many scalable NoSQL databases like Cassandra, HBase, Mongo, provide tunable consistency in order to define a specific guarantees level for an operation. And what make them scalable make them also vulnerable: in all case the whole cluster must run on synchronized clocks. It’s quite surprising that, given how important this is, it is not very detailled in the product documentation. One chapter in the HBase documentation, a pararaph in the MongoDB production readiness, a few lines in…


02 Jul: Why (and how) you should stop writing shell scripts

If you worked on a Big Data project, you should have seen, and maybe used, some shell scripts. Honestly, I love hearing “The future is now” while talking about a bunch of scripts scheduled by Oozie, but it seems like we couldn’t create a data project in 2018 without some lets-run-it.sh file. For the last 7 years I have seen many people writing x-SH scripts for various reasons, but the main reason today (at least on Big…


24 Jun: Protobuf and lib conflicts: how to use gRPC with HBase

I’m a huge fan a gRPC. Really. I’ve talk about it some months (years now…) ago, and for now, it met all my needs: high-performance, light, well-structured, simple… and an active community behind it. Even Netflix, one of major pro-REST approach advocate in the open source community, began the switch to gRPC the last year and place Ribbon, their huge client side IPC library, in maintenance mode. And the ecosystem still grow: Nginx recently annonced a native…