二次排序就是首先按照第一字段排序,然后再对第一字段相同的行按照第二字段排序,注意不能破坏第一次排序的结果。
1). 保存文件第二字段与第四字段:
package com.GroupOrder;
import org.apache.hadoop.io.WritableComparable;
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
public class OrderBean implements WritableComparable<OrderBean> {
private String Order_id; //商品id
private Double Order_price; //商品价格
public OrderBean() {
super();
}
public OrderBean(String order_id, Double order_price) {
Order_id = order_id; //商品种类
Order_price = order_price; //销售额
}
public int compareTo(OrderBean orderBean) {
int result = Order_id.compareTo(orderBean.getOrder_id()); //按照Ascill表进行比较
//第一次排序: 先对商品种类进行升序排序(该值作为key)
if (result > 0) {
result = 1;
} else if (result < 0) {
result = -1;
} else {
//第二次排序:再对销售额进行降序排序(该值作为value,不影响商品种类排序)
if (Order_price > orderBean.getOrder_price()) {
result = -1;
} else if (Order_price < orderBean.getOrder_price()) {
result = 1;
} else {
result = 0;
}
}
return result;
}
public void write(DataOutput dataOutput) throws IOException {
dataOutput.writeUTF(Order_id);
dataOutput.writeDouble(Order_price);
}
public void readFields(DataInput dataInput) throws IOException {
Order_id = dataInput.readUTF();
Order_price = dataInput.readDouble();
}
public Double getOrder_price() {
return Order_price;
}
public void setOrder_price(Double order_price) {
Order_price = order_price;
}
public String getOrder_id() {
return Order_id;
}
public void setOrder_id(String order_id) {
Order_id = order_id;
}
@Override
public String toString() {
return Order_id + '\t' +
Order_price;
}
}
序列化与反序列化顺序必须一致,你是用那种方式将数据写入内存中(序列化),
那就一定得用这种方式将数据从内存中读取出来(今天写入一个String类型数据到内存中(writeChars方法),完事之后,完了我不知道用什么方法从内存读取数据了,就用了一个readUTF方法,主程序一运行
Order_id = dataInput.readUTF();
Order_price = dataInput.readDouble();
这两变量为null,悲剧了…
所以还是序列化用什么方法将数据写入内存中,就用什么方法从内存中读取数据.不过我还是没明白序列化中writeChars 方法对应的 反序列方法怎么写,我的提示并没有它的反序列方法
- .Map程序
package com.GroupOrder;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;
public class OrderMapper extends Mapper<LongWritable, Text,OrderBean, NullWritable> {
OrderBean k = new OrderBean();
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
//0 Office Supplies 2 408.3
//1.获取一行
String line = value.toString();
//2.分割
String[] fields = line.split(",");
String Order_id = fields[1];
Double Order_price = Double.parseDouble(fields[3]);
//3. 封装
k.setOrder_id(Order_id);
k.setOrder_price(Order_price);
//4.写出
context.write(k,NullWritable.get());
}
}
Map程序的输出类型OrderBean, NullWritable.NullWritable是一个特殊类,实现方法为空实现,不从数据流中读数据,也不写入数据,只充当占位符,NullWritable.get()获取的是一个空值(理解起来就是将OrderBean中的字段都作为key,便于排序.但value值我们不给它值不合适,写入空值无疑是一个好方法)
- Reduce程序:
package com.GroupOrder;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;
public class OrderReducer extends Reducer<OrderBean, NullWritable,OrderBean,NullWritable> {
@Override
protected void reduce(OrderBean key, Iterable<NullWritable> values, Context context) throws IOException, InterruptedException {
context.write(key,NullWritable.get());
}
}
- 主程序
package com.GroupOrder;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.io.IOException;
public class OrderDriver {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
args = new String[]{"F:\\scala\\Workerhdfs\\input5","F:\\scala\\Workerhdfs\\output5"};
//1.获取job对象
Configuration conf = new Configuration();
Job job = Job.getInstance(conf);
//2.设置jar路径
job.setJarByClass(OrderDriver.class);
//3.关联mapper和reducer
job.setMapperClass(OrderMapper.class);
job.setReducerClass(OrderReducer.class);
//4 设置mapper输出的key和value类型
job.setMapOutputKeyClass(OrderBean.class);
job.setMapOutputValueClass(NullWritable.class);
//5. 设置最终输出的key和value类型
job.setOutputKeyClass(OrderBean.class);
job.setOutputValueClass(NullWritable.class);
//6.设置输出路径
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
//7.提交job
boolean result = job.waitForCompletion(true);
System.exit(result ? 0 : 1);
}
}
效果如下:
- .寻找key相同,value最大值输出到文件中,添加以下程序:
package com.GroupOrder;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.io.WritableComparator;
public class OrderGroupComparator extends WritableComparator {
protected OrderGroupComparator(){
super(OrderBean.class,true);
}
@Override
public int compare(WritableComparable a, WritableComparable b) {
//只要商品种类相同,就认为是相同的key
OrderBean aBean = (OrderBean) a;
OrderBean bBean = (OrderBean) b;
int result = aBean.getOrder_id().compareTo(bBean.getOrder_id());
if (result > 0) {
result = 1;
} else if (result < 0) {
result = -1;
} else {
result = 0;
}
return result;
}
}
在4.主程序中添加一行代码,设置reduce端的分组:
package com.GroupOrder;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.io.IOException;
public class OrderDriver {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
args = new String[]{"F:\\scala\\Workerhdfs\\input5","F:\\scala\\Workerhdfs\\output6"};
//1.获取job对象
Configuration conf = new Configuration();
Job job = Job.getInstance(conf);
//2.设置jar路径
job.setJarByClass(OrderDriver.class);
//3.关联mapper和reducer
job.setMapperClass(OrderMapper.class);
job.setReducerClass(OrderReducer.class);
//4 设置mapper输出的key和value类型
job.setMapOutputKeyClass(OrderBean.class);
job.setMapOutputValueClass(NullWritable.class);
//5. 设置最终输出的key和value类型
job.setOutputKeyClass(OrderBean.class);
job.setOutputValueClass(NullWritable.class);
//设置 reduce端的分组
job.setGroupingComparatorClass(OrderGroupComparator.class);
//6.设置输出路径
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
//7.提交job
boolean result = job.waitForCompletion(true);
System.exit(result ? 0 : 1);
}
}
效果如下:
jar包形式,运行在hadoop集群上:
这里将jar包上传到hadoop文件系统…
hadoop文件系统新建一个目录,并且将测试文件上传至此目录下,提交到yarn上运行.
查看分区文件内容: