29 Mar: Playing with Stack Overflow data

I used the data from Stack Overflow in order to see the interest on some of the products I follow (yes, HBase, Spark and others). The interest is calculated for each month on the last 5 years and is based on the number of posts and replies associated for a tag (ex: hdfs, elasticsearch and so on). Remember that Stack Overflow is a (huge) developper community with questions about programming, so the results are automatically biased. Indeed,…


18 Jan: Solr: manage time-based collections

If you use Solr as your fulltext search engine, you may be frustated to miss the excellent tool Curator from Elastic, which allow you to manage your indices. Cloudera offers an admin tool for Solr, named solrctl, a light utility to supervise a SolrCloud deployment. Although solrctl has some useful commands, you don’t have the possibility to delete old time-based collections. Time-based collections, and globally shard/partition per time frame, is a common pattern for agregation but also…

15 Apr: How to deploy an Elasticsearch cluster easily

Here is a simple sh allowing you to deploy ElasticSearch on multiple servers with dedicated roles: master, slave or monitor. -Master: can be an Elasticsearch master, acts as load balancer on the cluster, doesn’t store data and can use the http transport. -Slave: a data node, can not be an Elasticsearch master and can not use the http transport. -Monitor: doesn’t store data, can not be an Elasticsearch master, hold plugins and can use the http transport….