hdfs

#

29 Mar: Playing with Stack Overflow data

I used the data from Stack Overflow in order to see the interest on some of the products I follow (yes, HBase, Spark and others). The interest is calculated for each month on the last 5 years and is based on the number of posts and replies associated for a tag (ex: hdfs, elasticsearch and so on). Remember that Stack Overflow is a (huge) developper community with questions about programming, so the results are automatically biased. Indeed,…

09 Apr: Transfert files from Hadoop to a remote server via ssh

When working with Hadoop, you produce files in the hdfs. In order to copy them in one of your remote servers, you have to first use the get or the copyToLocal command to copy the files in your local filesystem and then use a scp command. But this two steps process is not really efficient since you are double-copying the files. sshj is a pure Java implementation of SSHv2 allowing you to connect to an sshd server…