Home > Programming > Using Eclipse to connect to Hadoop server and write MapReduce application

Using Eclipse to connect to Hadoop server and write MapReduce application

Preparation:

Download eclipse-SDK-3.6.1-linux-gtk-tar.gz to /home/grid/ and then extract to /home/grid/eclipse using the following command:

$ taz –zxvf eclipse-SDK-3.6.1-linux-gtk-tar.gz

Notice that you can also put the eclipse package to the folder shared with windows as you install hadoop in the previous article.

Create a file named eclipse.sh and edit its content as below:

export JAVA_HOME=/usr/java/jdk1.6.0_23/

export CLASSPATH=/usr/java/jdk1.6.0_23/lib

/home/grid/eclipse/eclipse –vm /usr/java/jdk1.6.0_23/bin/java –data ~/workspace &

Copy eclipse.sh to /usr/local/bin and change its permission:

$ cp ./eclipse.sh /usr/local/bin

$ chmod 755 /usr/local/bin/eclipse.sh

This wraps up the installation of eclipse. To open the program, you need only to type the command eclipse.sh. Please make sure that hadoop-0.19.2-eclipse-plugin.jar (within hadoop-0.19.2/contrib/) has been copied to eclipse/plugins before you carry out the following steps.

Connect to hadoop server:

+open Window->Open Perspective->Other and chose Map/Reduce then OK

+Open Window->Show View->Other, click Map/Reduce Locations under Map/Reduce Tools. Enter the same information as in your conf/hadoop-site


After clicking OK button, you can see DFS Locations on Project Explorer.

Now you can manipulate the HDFS through Eclipse programme.

Writing MapReduce Programme:

Create new Map/Reduce Project: File->New->Project->Map/Reduce Project:

Click Configure Hadoop install directory and give the location of your hadoop (here  /home/grid/hadoop-0.19.2)

Add a class named word_count to the project, and give its content as follow:

import java.io.IOException;

import java.util.*;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.conf.*;

import org.apache.hadoop.io.*;

import org.apache.hadoop.mapred.*;

import org.apache.hadoop.util.*;

public class word_count {

public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {

private final static IntWritable one = new IntWritable(1);

private Text word = new Text();

public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {

String line = value.toString();

StringTokenizer tokenizer = new StringTokenizer(line);

while (tokenizer.hasMoreTokens()) {

word.set(tokenizer.nextToken());

output.collect(word, one);

}

}

}

public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {

public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {

int sum = 0;

while (values.hasNext()) {

sum += values.next().get();

}

output.collect(key, new IntWritable(sum));

}

}

public static void main(String[] args) throws Exception {

JobConf conf = new JobConf(word_count.class);

conf.setJobName(“wordcount”);

conf.setOutputKeyClass(Text.class);

conf.setOutputValueClass(IntWritable.class);

conf.setMapperClass(Map.class);

conf.setCombinerClass(Reduce.class);

conf.setReducerClass(Reduce.class);

conf.setInputFormat(TextInputFormat.class);

conf.setOutputFormat(TextOutputFormat.class);

FileInputFormat.setInputPaths(conf, new Path(args[0]));

FileOutputFormat.setOutputPath(conf, new Path(args[1]));

JobClient.runJob(conf);

}

}

Right click at the folder WordCount ->export->java/jar file and give a place for the outcome.

You can run your MapReduce application by using the command (in the terminal)

$ bin/hadoop jar word_count.jar word_count input output

Note: when you right click on the word_count.java and chose Run as, you can see that eclipse does give you Run on hadoop function, but it doesn’t work at the moment. IBM has also provided MapReduce Tool for Eclipse, and once again it fails to connect to Hadoop server. Both Hadoop-0.20.2 and Hadoop-0.21.0 have the similar problem with eclipse-plugin. Fortunately, I’ve found a plugin (provided by Google) that works well with Hadoop-0.20.2 (and probably also works well with Hadoop-0.19.2 but I’ve never tried). Download it here (or if you get it from here, you need to change it name to hadoop-0.20.2-eclipse-plugin.jar) and put it in eclipse/plugins and enjoy.

Categories: Programming Tags: ,
  1. October 1, 2014 at 8:26 PM

    I think this is among the most significant information for me.
    And i’m glad reading your article. But want to remark on some general things, The web
    site style is perfect, the articles iis really
    nice : D. Good job, cheers

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: