02 Apr: LLAP & CGroups: a marriage made in heaven

Hive LLAP (for Live Long and Process), also called Interactive Query on HDInsight, is a service whose promise is to provide performance below the second for queries on very large tables. To achieve interactive performance levels, LLAP relies on Hadoop by using the Tez execution engine and by adding LLAP daemons to cache data, manage JIT optimization, and eliminate most of the startup costs. Caching, pre-fetching, some query processing and access control are moved into the daemons….


29 Mar: Playing with Stack Overflow data

I used the data from Stack Overflow in order to see the interest on some of the products I follow (yes, HBase, Spark and others). The interest is calculated for each month on the last 5 years and is based on the number of posts and replies associated for a tag (ex: hdfs, elasticsearch and so on). Remember that Stack Overflow is a (huge) developper community with questions about programming, so the results are automatically biased. Indeed,…


09 Nov: Submitting Spark Job via Knox on Yarn

Apache Knox is a REST API Gateway for interacting with Apache Hadoop clusters. It offers an extensible reverse proxy exposing securely REST APIs and HTTP based services in any Hadoop platform. Althought Knox is not designed to be a channel for high volume data ingest or export, it is perfectly suited for exposing a single entrypoint to your cluster and can be seen as a bastion for all your applications. One of the possible use-case of Knox…