MapReduce Average Computation Example

This example demonstrates how to compute the average value of a numeric field grouped by a key using the MapReduce framework. The process follows a standard pattern: the mapper extracts key-value pairs, the shuffle phase groups values by key, and the reducer computes the sum and count to produce the average.

Setup and Environment

Ensure Hadoop is running. Start the cluster with:

start-all.sh

Create a working directory on the local filesystem:

mkdir -p /data/mapreduce4

Download and extract the Hadoop 2.x library package (e.g., hadoop2lib.zip) into the mapreduce folder.

Create an input directory on HDFS and upload the data file:

hadoop fs -mkdir -p /mymapreduce4/in
hadoop fs -put /data/mapreduce4/goods_click /mymapreduce4/in

Data format note: Each line in goods_click should contain two comma-separated fields: a key (e.g., product ID) and an integer value (e.g., clicks). Ensure no trailing spaces after the last field, as they can cause parsing errors.

Implementation

Write the following Java code in an IDE (e.g., IntelliJ IDEA):

package mapreduce;

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

public class ComputeAverage {

    public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {
        private Text outKey = new Text();

        @Override
        public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
            String line = value.toString();
            System.out.println(line);
            String[] parts = line.split(",");
            outKey.set(parts[0]);
            int clickCount = Integer.parseInt(parts[1]);
            context.write(outKey, new IntWritable(clickCount));
        }
    }

    public static class AverageReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
        @Override
        public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
            int sum = 0;
            int count = 0;
            for (IntWritable val : values) {
                sum += val.get();
                count++;
            }
            int average = sum / count;
            context.write(key, new IntWritable(average));
        }
    }

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        System.out.println("Starting job...");
        Job job = Job.getInstance(conf, "ComputeAverage");
        job.setJarByClass(ComputeAverage.class);
        job.setMapperClass(TokenizerMapper.class);
        job.setReducerClass(AverageReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);

        FileInputFormat.addInputPath(job, new Path("hdfs://192.168.149.10:9000/mymapreduce4/in/goods_click"));
        FileOutputFormat.setOutputPath(job, new Path("hdfs://192.168.149.10:9000/mymapreduce4/out"));

        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

Configuration File

Create a log4j.properties file under the resources directory with content:

hadoop.root.logger=DEBUG, console
log4j.rootLogger = DEBUG, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.out
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n

Build and Run

Include the Hadoop 2.x libraries (e.g., hadoop2lib) in the project build path. Run the job. If a permission error occurs, adjust the HDFS user to match your Linux username.

The job will produce output in /mymapreduce4/out on HDFS, where each line contains a key and its average value.

Tags: mapreduce Hadoop average bigdata computation

Posted on Mon, 11 May 2026 07:19:01 +0000 by DarkPrince2005