2013

25 Nov: How to kill Hadoop jobs matching a pattern?

Today, I had to kill a list of jobs (45) running on my Hadoop cluster. Ok, let’s have a look to the docs http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CommandsManual.html#job But wait a minute… No, Hadoop knows the “kill” command, but not the “pkill”… One solution is: import java.io.IOException; import org.apache.commons.cli.CommandLine; import org.apache.commons.cli.CommandLineParser; import org.apache.commons.cli.HelpFormatter; import org.apache.commons.cli.Options; import org.apache.commons.cli.ParseException; import org.apache.commons.cli.PosixParser; import org.apache.commons.lang.ArrayUtils; import org.apache.commons.lang.StringUtils; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobStatus; import org.apache.hadoop.mapred.RunningJob; import org.slf4j.Logger; import org.slf4j.LoggerFactory; public class PKill { private final…

04 Nov: Pooling a Thrift client

Thrift is an interface definition language that is used to define and create services for numerous languages. The Thrift stack relies on protocols (TBinaryProtocol, TCompactProtocol…) and transports (TSocket, TFileTransport…). But, since the transport layer is essentially a wrapper on a socket or a file, Thrift is NOT thread-safe. Like other resources not thread-safe, you have the choice: work with a costly locking algo, create each time a new connection to the resource or think about pool your…

21 Oct: A best Spring AsyncRestTemplate!

I wrote an article on the famous Spring’s RestTemplate some months ago (https://layer4.fr/2013/02/24/a-best-spring-resttemplate/). Since the last version of Spring 4, currently 4.0.0.M3, or the BUILD-SNAPSHOT for the most courageous, you can use a custom RestTemplate in an async way: the AsyncRestTemplate API. Why should I use that? The purpose of this API is to allow Java applications to easily execute HTTP requests and asynchronously process the HTTP responses. In non async mode, the code will block until…

17 Oct: Myrrix, the REST-ified Mahout for real-time recommandations

Myrrix is a complete, real-time, scalable clustering and recommender system, evolved from Apache Mahout. The full Myrrix system uses two components: a Computation Layer and one or many Serving Layers. While the Computation Layer computes the large machine learning models needed by the Serving Layer, the Serving Layer is a Java HTTP server application. This server serves user requests in real-time, making recommendations and receiving new input via a REST API. Many instances of the Serving Layer…

17 Oct: How to generate a changelog from Jira for your deb/rpm/…

A changelog is a log or record of changes made to a project, such as a website or software project, usually including such records as bug fixes, new features, etc. Some open source projects include a changelog as one of the top level files in their distribution. If you are running a RHEL distribution (Centos, Fedora, Red Hat…), you can read it via the rpm command: rpm -q –changelog vim-enhanced.x86_64 | less For Debian based distributions, you…

03 Sep: Installing Tomcat 7 on Debian/Ubuntu

First, a simple apt-get: apt-get install tomcat7 libtcnative-1 tomcat7-user tomcat7-docs tomcat7-admin Wait, “libtcnative-1”? Tomcat can use the Apache Portable Runtime to provide superior scalability, performance, and better integration with native server technologies. The Apache Portable Runtime is a highly portable library that is at the heart of Apache HTTP Server 2.x. APR has many uses, including access to advanced IO functionality (such as sendfile, epoll and OpenSSL), OS level functionality (random number generation, system status, etc), and…

22 Aug: Implementing a role hierarchy in Spring Security (Java config style)

From http://static.springsource.org/spring-security/site/docs/3.1.x/reference/authz-arch.html: It is a common requirement that a particular role in an application should automatically “include” other roles. For example, in an application which has the concept of an “admin” and a “user” role, you may want an admin to be able to do everything a normal user can. To achieve this, you can either make sure that all admin users are also assigned the “user” role. Alternatively, you can modify every access constraint which requires…

19 Aug: How to map unknown JSON properties with Jackson?

For a current project, I need to map a JSON object that have some known fields, and some “dynamics” fields. I don’t want to ignore these fields, just to get them in a map inside my bean. After digging for 20 minutes, I finally found the right annotations to use: @JsonAnyGetter and @JsonAnySetter. @JsonAnySetter is a simple marker that can be used to define a two-argument method (first argument name of property, second value to set), to…

19 Aug: A simple Tcp Client/Server using Reactor

Reactor is a foundation for asynchronous applications on the JVM. It provides abstractions for Java, Groovy and other JVM languages to make building event and data-driven applications easier. It’s also really fast. On modest hardware, it’s possible to process around 15,000,000 events per second with the fastest non-blockingDispatcher. Other dispatchers are available to provide the developer with a range of choices from thread-pool style, long-running task execution to non-blocking, high-volume task dispatching. Since the first milestone of…

15 Apr: How to deploy an Elasticsearch cluster easily

Here is a simple sh allowing you to deploy ElasticSearch on multiple servers with dedicated roles: master, slave or monitor. -Master: can be an Elasticsearch master, acts as load balancer on the cluster, doesn’t store data and can use the http transport. -Slave: a data node, can not be an Elasticsearch master and can not use the http transport. -Monitor: doesn’t store data, can not be an Elasticsearch master, hold plugins and can use the http transport….