29 Mar: Playing with Stack Overflow data

I used the data from Stack Overflow in order to see the interest on some of the products I follow (yes, HBase, Spark and others). The interest is calculated for each month on the last 5 years and is based on the number of posts and replies associated for a tag (ex: hdfs, elasticsearch and so on). Remember that Stack Overflow is a (huge) developper community with questions about programming, so the results are automatically biased. Indeed,…


06 Oct: Find and kill slow running queries in MongoDB

In Mongo, or more generally in any data storage engine, queries or updates that take longer than expected to run can be caused by many reasons: – Slow network – Wrong schema design (we all have seen the famous all-in-one table…) – Wrong database design (“let’s store 100To of data in a standalone mongod!”) – Bad partitioning (Hbase table with 200 regions with 2MB of data) – Lack of useful indexes – No statistics – Incorrect hardware…