2013 November

25 Nov: How to kill Hadoop jobs matching a pattern?

Today, I had to kill a list of jobs (45) running on my Hadoop cluster. Ok, let’s have a look to the docs http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CommandsManual.html#job But wait a minute… No, Hadoop knows the “kill” command, but not the “pkill”… One solution is: import java.io.IOException; import org.apache.commons.cli.CommandLine; import org.apache.commons.cli.CommandLineParser; import org.apache.commons.cli.HelpFormatter; import org.apache.commons.cli.Options; import org.apache.commons.cli.ParseException; import org.apache.commons.cli.PosixParser; import org.apache.commons.lang.ArrayUtils; import org.apache.commons.lang.StringUtils; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobStatus; import org.apache.hadoop.mapred.RunningJob; import org.slf4j.Logger; import org.slf4j.LoggerFactory; public class PKill { private final…

04 Nov: Pooling a Thrift client

Thrift is an interface definition language that is used to define and create services for numerous languages. The Thrift stack relies on protocols (TBinaryProtocol, TCompactProtocol…) and transports (TSocket, TFileTransport…). But, since the transport layer is essentially a wrapper on a socket or a file, Thrift is NOT thread-safe. Like other resources not thread-safe, you have the choice: work with a costly locking algo, create each time a new connection to the resource or think about pool your…