29 Mar: Playing with Stack Overflow data

I used the data from Stack Overflow in order to see the interest on some of the products I follow (yes, HBase, Spark and others). The interest is calculated for each month on the last 5 years and is based on the number of posts and replies associated for a tag (ex: hdfs, elasticsearch and so on). Remember that Stack Overflow is a (huge) developper community with questions about programming, so the results are automatically biased. Indeed,…


18 Jan: Solr: manage time-based collections

If you use Solr as your fulltext search engine, you may be frustated to miss the excellent tool Curator from Elastic, which allow you to manage your indices. Cloudera offers an admin tool for Solr, named solrctl, a light utility to supervise a SolrCloud deployment. Although solrctl has some useful commands, you don’t have the possibility to delete old time-based collections. Time-based collections, and globally shard/partition per time frame, is a common pattern for agregation but also…