Data

#

09 Nov: Submitting Spark Job via Knox on Yarn

Apache Knox is a REST API Gateway for interacting with Apache Hadoop clusters. It offers an extensible reverse proxy exposing securely REST APIs and HTTP based services in any Hadoop platform. Althought Knox is not designed to be a channel for high volume data ingest or export, it is perfectly suited for exposing a single entrypoint to your cluster and can be seen as a bastion for all your applications. One of the possible use-case of Knox…

07 Feb: High performance RSS/Atom parsing

Parsing RSS feeds is very easy in Java. Several libs exist to get the job done: feed4j, rssowl, Apache Abdera and many others. But the most commly used is ROME. ROME is a set of RSS and Atom Utilities for Java. It makes it easy to work in Java with most syndication formats: RSS 0.9x, 1.0, 2.0 and Atom 0.3, 1.0. Reading RSS from a source is dead-simple, you need these dependencies: <!– Rome Atom+RSS –> <dependency>…

17 Oct: Myrrix, the REST-ified Mahout for real-time recommandations

Myrrix is a complete, real-time, scalable clustering and recommender system, evolved from Apache Mahout. The full Myrrix system uses two components: a Computation Layer and one or many Serving Layers. While the Computation Layer computes the large machine learning models needed by the Serving Layer, the Serving Layer is a Java HTTP server application. This server serves user requests in real-time, making recommendations and receiving new input via a REST API. Many instances of the Serving Layer…