#

19 Nov: Working with Parquet files

Apache Parquet is a columnar storage format available for most of the data processing frameworks in the Hadoop ecosystem: Hive Pig Spark Drill Arrow Apache Impala Cascading Crunch Tajo … and many more! In Parquet, the data are compressed column by column. This means that commands like these: hdfs dfs -cat hdfs://nn1.example.com/file1 hdfs dfs -text /…/file2 can not work anymore on Parquet files, all you can see are binary chunks on your terminal. Thankfully, Parquet provides an…

#

16 Nov: Using HBase REST API with the Knox Java client

I’ve already introduced Knox in a previous post in order to deploy Spark Job with Knox using the Java client. This post is still about the Knox Java client, but we’ll see here an other usage with HBase. HBase provides a well documented and rich REST API with many endpoints exposing the data in various formats (JSON, XML and Protobuf!). First, we need to import the dependencies for the Knox Java client: <dependency> <groupId>org.apache.knox</groupId> <artifactId>gateway-shell</artifactId> <version>0.10.0</version> </dependency>…

#

09 Nov: Submitting Spark Job via Knox on Yarn

Apache Knox is a REST API Gateway for interacting with Apache Hadoop clusters. It offers an extensible reverse proxy exposing securely REST APIs and HTTP based services in any Hadoop platform. Althought Knox is not designed to be a channel for high volume data ingest or export, it is perfectly suited for exposing a single entrypoint to your cluster and can be seen as a bastion for all your applications. One of the possible use-case of Knox…

#

02 Nov: Microservices and gRPC: Use Atomix as service discovery

gRPC is a modern open source high performance RPC framework initiated by Google and supported by many languages and platforms (C++, Java, Go, Node, Ruby, Python and C# across Linux, Windows, and Mac). It is used by many projects (etcd/CoreOS, containerd/Docker, cockroachdb/Cockroach Labs…) and has reached a significant milestone with its 1.0 release. Used in a distributed environments where a large number of microservices are running, gRPC supports rich cloud oriented features like: – load balancing/discovery –…

#

25 Oct: Efficient logging with Spring Boot, Logback and Logstash

Logging is an important part of any entreprise application and Logback makes an excellent choice: it’s simple, fast, light and very powerful. Spring Boot has a great support for Logback and provides lot of features to configure it. In this article I will present you an integration for an entreprise logging stack using Logback, Spring Boot and Logstash. WARNING The Spring Boot recommands to use the -spring variants for your logging configuration (for example logback-spring.xml rather than…

#

17 Oct: Advanced tools: playing with Java Native Access

This post results from a recent deep diving in the source code of Elasticsearch, which uses JNA mainly for memory management when configuring the mlockall. I will present you how to use JNA in a very simple example: how to check the user who has launched the JVM. Note Remember that when you are thinking about solution using OS native calls, you must deal and depend with platform librairies. So use it carefully and only for specific…

#

10 Oct: OS monitoring with… Java

Sometimes it may be useful to get system information like the usage of a disk or the available network interfaces. For instance, Elasticsearch use this kind of tools in order to display at startup time some infos about open file descriptors or the size of the direct memory available for the JVM. The aim is not to replace a real system monitoring agent, but to guide the user to take advantage of the product by configuring it…

#

06 Oct: Find and kill slow running queries in MongoDB

In Mongo, or more generally in any data storage engine, queries or updates that take longer than expected to run can be caused by many reasons: – Slow network – Wrong schema design (we all have seen the famous all-in-one table…) – Wrong database design (“let’s store 100To of data in a standalone mongod!”) – Bad partitioning (Hbase table with 200 regions with 2MB of data) – Lack of useful indexes – No statistics – Incorrect hardware…

12 May: JSON pretty print with… Spring Boot

I wrote a post a few months ago about a pretty-print hack for JAX-RS (https://layer4.fr/2015/02/json-pretty-print-with-jax-rsjackson/). Here is a version using Spring Boot. All the stuff is located in a @Configuration class extending WebMvcConfigurerAdapter: import java.io.IOException; import java.util.List; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.context.annotation.Configuration; import org.springframework.http.converter.HttpMessageConverter; import org.springframework.http.converter.json.MappingJackson2HttpMessageConverter; import org.springframework.web.context.request.RequestAttributes; import org.springframework.web.context.request.RequestContextHolder; import org.springframework.web.context.request.ServletRequestAttributes; import org.springframework.web.servlet.config.annotation.EnableWebMvc; import org.springframework.web.servlet.config.annotation.WebMvcConfigurerAdapter; import com.fasterxml.jackson.core.JsonGenerator; import com.fasterxml.jackson.core.util.DefaultPrettyPrinter; import com.fasterxml.jackson.databind.ObjectMapper; @Configuration @EnableWebMvc public class WebConfiguration extends WebMvcConfigurerAdapter { @Autowired private ObjectMapper mapper; @Override public void extendMessageConverters(List<HttpMessageConverter<?>>…

23 Feb: JSON pretty print with JAX-RS/Jackson

If you have already worked with ElasticSearch, you may have used a very useful companion when dealing with your curl: the “?pretty” parameter which allows you to see the JSON response well-formated. If not, have a look here, you’ll thank me later: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/common-options.html It can be usefull too to have this kind of feature in your JAX-RS project. And this is “pretty” simple (…). All you have to do is create a class extending JacksonJsonProvider: import java.io.IOException;…