大数据推荐系统算法(1)大数据框架介绍大数据推荐系统算法(2) lambda架构大数据推荐系统算法(3) 用户画像大数据推荐系统(4)推荐算法大数据推荐系统(5)Mahout大数据推荐系统(6)Spark大数据推荐系统(7)推荐系统与Lambda架构大数据推荐系统(8)分布式数据收集和存储大数据推荐系统(9)实战开发环境:
Linux + Intellij IDEA(IDE) +SBT(Simple Build Tool)(项目管理工具) 和 Maven + 持续集成:Jenkins(Jenkins是基于Java开发的一种持续集成工具,用于监控持续重复的工作)
Spark 基于内存,图调度,算子简单。 scala
H2O 预测分析的平台
Flink 做流处理的平台 (也可做批处理)Mahout架构:high-level
Mahout架构:low-level
Mahout 推荐系统
(1)Mahout实现了协同过滤框架
使用历史数据(打分,点击,购买等)作为推荐的依据
User-based: 通过发现类似的用户推荐商品。由于用户多变的特性,这种方法很那扩展;
Item-based:通过计算item之间相似度推荐商品。商品不易变化,相似度矩阵可离线计算得到。(诞生于Amazon)
MF-based:通过将原始的user-item矩阵分解成小的矩阵,分析潜在的影响因子,并以解释用户的行为。(诞生于Netflix Prize)
(2)Mahout实现了协同过滤框架
SVD(Singular Value Decomposition)因式分解实现协同过滤
基于ALS(alternating least squares)的协同过滤算法 (NMF)
Mahout推荐系统架构
输入输出
输入:原始数据(user preferences,用户偏好)
输出:用户偏好估计
步骤
Step 1:将原始数据映射到Mahout定义的Data Model中 (U I P )
Step 2: 调优推荐组件
相似度组件,临界关系组件等
Step 3: 计算排名估计值
Step 4:评估推荐结果
Mahout推荐系统组件
Mahout关键抽象是通过Java Interface实现的
DataModel Interface
将原始数据映射成Mahout兼容格式
UserSimilarity Interface
计算两个用户间的相关度
ItemSimilarity Interface
计算两个商品间的相关度
UserNeighborhood Interface
定义用户或商品间的“临近”
Recommender Interface
实现具体的推荐算法,完成推荐功能(包括训练,预测等)
(1)DataModel
不管是什么数据源,他们共享同样的底层实现
基本对象:Preference
三元组(user, item, score)
存储在UserPreferenceArray中
(2)UserSimilarity
UserSimilarity定义了两个用户的相似度
类似的,ItemSimilarity定义了两个商品间的相似度
相似度实现
Pearson Correlation
Spearman Correlation
Euclidean Distance
Tanimoto Coefficient
LogLikelihood Similarity
(3)UserNeighborhood
推荐系统评估
第一种Prediction-based measures
第二种 IR-based measures
实例1:preferences
要求
创建user-item偏好数据,并输出
实现
使用GenericUserPreferenceArray创建数据
通过PreferenceArray存储数据
package com.dylan.example;
import org.apache.mahout.cf.taste.impl.model.GenericUserPreferenceArray;
import org.apache.mahout.cf.taste.model.Preference;
import org.apache.mahout.cf.taste.model.PreferenceArray;
public class CreatePreferenceArray {
private CreatePreferenceArray() {
}
public static void main(String[] args) {
PreferenceArray User1Pref = new GenericUserPreferenceArray(2);
User1Pref.setUserID(0, 1L);
User1Pref.setItemID(0, 101L);
User1Pref.setValue(0, 3.0f);
User1Pref.setItemID(1, 102L);
User1Pref.setValue(1, 4.0f);
Preference pref = User1Pref.get(1);
System.out.println(User1Pref);
}
}
实例2:data model
PreferenceArray存储了单个用户的偏好
所有用户的偏好数据如何保存?
HashMap? NO!
Mahout引入了一个为推荐任务优化的数据结构
FastByIDMap
需求
使用GenericDataModel读入FastByIDMap数据
package com.dylan.example;
import org.apache.mahout.cf.taste.impl.common.FastByIDMap;
import org.apache.mahout.cf.taste.impl.model.GenericDataModel;
import org.apache.mahout.cf.taste.impl.model.GenericUserPreferenceArray;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.model.PreferenceArray;
public class CreateGenericDataModel {
private CreateGenericDataModel() {
}
public static void main(String[] args) {
FastByIDMap<PreferenceArray> preferences = new FastByIDMap<PreferenceArray>();
PreferenceArray User1Pref = new GenericUserPreferenceArray(2);
User1Pref.setUserID(0, 1L);
User1Pref.setItemID(0, 101L);
User1Pref.setValue(0, 3.0f);
User1Pref.setItemID(1, 102L);
User1Pref.setValue(1, 4.0f);
PreferenceArray User2Pref = new GenericUserPreferenceArray(2);
User2Pref.setUserID(0, 2L);
User2Pref.setItemID(0, 101L);
User2Pref.setValue(0, 3.0f);
User2Pref.setItemID(1, 102L);
User2Pref.setValue(1, 4.0f);
preferences.put(1L, User1Pref);
preferences.put(2L, User2Pref);
DataModel model = new GenericDataModel(preferences);
System.out.println(model);
}
}
实例3:Recommender
需求
通过User-based协同过滤推荐算法给用户1推荐2个商品
实现
使用FileDataModel读入文件
通过PearsonCorrelationSimilarity来计算相似度
使用GenericUserBasedRecommender构建推荐引
package com.dylan.example;
import org.apache.mahout.cf.taste.impl.model.file.*;
import org.apache.mahout.cf.taste.impl.similarity.*;
import org.apache.mahout.cf.taste.impl.neighborhood.*;
import org.apache.mahout.cf.taste.impl.recommender.*;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.similarity.*;
import org.apache.mahout.cf.taste.neighborhood.*;
import org.apache.mahout.cf.taste.recommender.*;
import java.io.File;
import java.util.List;
public class RecommenderIntro {
private RecommenderIntro() {
}
public static void main(String[] args) throws Exception{
DataModel model = new FileDataModel(new File("/root/data/ua.base"));
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);
Recommender recommender = new GenericUserBasedRecommender(model, neighborhood, similarity);
List<RecommendedItem> recommendedItems = recommender.recommend(1, 20);
for (RecommendedItem recommendedItem: recommendedItems){
System.out.println(recommendedItem);
}
}
}
实例4 推荐模型评估(1)
需求
评估实例3的推荐系统的优劣
实现
使用AverageAbsoluteDifferenceRecommenderEvaluator和RMSRecommenderEvaluator来评估模型
通过RecommenderBuilder来实现评估模型
package com.dylan.example;
import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.eval.RecommenderBuilder;
import org.apache.mahout.cf.taste.eval.RecommenderEvaluator;
import org.apache.mahout.cf.taste.impl.eval.AverageAbsoluteDifferenceRecommenderEvaluator;
import org.apache.mahout.cf.taste.impl.eval.RMSRecommenderEvaluator;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;
import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import org.apache.mahout.cf.taste.recommender.*;
import org.apache.mahout.cf.taste.similarity.UserSimilarity;
import org.apache.mahout.common.RandomUtils;
import java.io.File;
public class EvaluatorIntro {
private EvaluatorIntro() {
}
public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed();
final DataModel model = new FileDataModel(new File("/root/data/ua.base"));
RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
RecommenderEvaluator recommenderEvaluator = new RMSRecommenderEvaluator();
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
@Override
public Recommender buildRecommender(DataModel model) throws TasteException {
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);
return new GenericUserBasedRecommender(model, neighborhood, similarity);
}
};
double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
double rmse = recommenderEvaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
System.out.println(score);
System.out.println(rmse);
}
}
推荐模型评估(2)
需求
通过IR指标来评估实例3的推荐系统的优劣
实现
使用RecommenderIRStatsEvaluator来进行评估
package com.dylan.example;
import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.eval.*;
import org.apache.mahout.cf.taste.impl.eval.GenericRecommenderIRStatsEvaluator;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;
import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import org.apache.mahout.cf.taste.recommender.*;
import org.apache.mahout.cf.taste.similarity.UserSimilarity;
import org.apache.mahout.common.RandomUtils;
import java.io.File;
public class IREvaluatorIntro {
private IREvaluatorIntro() {
}
public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed();
final DataModel model = new FileDataModel(new File("/root/data/ua.base"));
RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator();
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
@Override
public Recommender buildRecommender(DataModel model) throws TasteException {
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);
return new GenericUserBasedRecommender(model, neighborhood, similarity);
}
};
IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5, GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1.0);
System.out.println(stats.getPrecision());
System.out.println(stats.getRecall());
System.out.println(stats.getF1Measure());
}
}
实例6:MovieLens推荐系统
需求
使用MovieLens 1M数据集实现电影推荐系统
步骤
实现MovieLens数据集的DataModel
实现Item-based和User-based的协同过滤推荐,并保存结果
1.新构建了 data
2.多线程来进行相似度矩阵的求解,得到similarities.csv的文件
3.对用户进行推荐 得到userRcomed.csv
Recommender cachingRecommender = new CachingRecommender(recommender); 做缓存的作用(数据量大的时候)
(1)把原始数据’::'分割的数据,转变成‘,’的数据
package com.dylan.MovieLens;
import org.apache.commons.io.Charsets;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.common.iterator.FileLineIterable;
import java.io.*;
import java.util.regex.Pattern;
public class MovieLensDataModel extends FileDataModel {
private static String COLON_DELIMITER="::";
private static Pattern COLON_DELIMITER_PATTERN=Pattern.compile(COLON_DELIMITER);
public MovieLensDataModel(File ratingsFile) throws IOException{
super(convertFile(ratingsFile));
}
private static File convertFile(File orginalFile) throws IOException{
File resultFile = new File(System.getProperty("java.io.tmpdir"), "ratings.csv");
if (resultFile.exists()){
resultFile.delete();
}
try(Writer writer = new OutputStreamWriter(new FileOutputStream(resultFile), Charsets.UTF_8)) {
for (String line: new FileLineIterable(orginalFile, false)){
int lastIndex = line.lastIndexOf(COLON_DELIMITER);
if (lastIndex < 0 ){
throw new IOException("Invalid data!");
}
String subLine = line.substring(0, lastIndex);
String convertedSubLine = COLON_DELIMITER_PATTERN.matcher(subLine).replaceAll(",");
writer.write(convertedSubLine);
writer.write('\n');
}
} catch (IOException ioe){
resultFile.delete();
throw ioe;
}
return resultFile;
}
}
(2)多线程批量生成结果。 批处理的方式。得到相似度文件。ICF可以用多线程处理(离线处理),UCF不行
package com.dylan.MovieLens;
import org.apache.mahout.cf.taste.impl.recommender.GenericItemBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.LogLikelihoodSimilarity;
import org.apache.mahout.cf.taste.impl.similarity.precompute.FileSimilarItemsWriter;
import org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.recommender.ItemBasedRecommender;
import org.apache.mahout.cf.taste.similarity.ItemSimilarity;
import org.apache.mahout.cf.taste.similarity.precompute.BatchItemSimilarities;
import org.apache.mahout.cf.taste.similarity.precompute.SimilarItemsWriter;
import java.io.File;
public class BatchItemSimilaritiesMovieLens {
private BatchItemSimilaritiesMovieLens(){
}
public static void main(String[] args) throws Exception{
if (args.length !=1){
System.err.println("Needs MovieLens 1M dataset as arugument!");
System.exit(-1);
}
File resultFile = new File(System.getProperty("java.io.tmpdir"), "similarities.csv");
DataModel dataModel = new MovieLensDataModel(new File(args[0]));
ItemSimilarity similarity = new LogLikelihoodSimilarity(dataModel);
ItemBasedRecommender recommender = new GenericItemBasedRecommender(dataModel, similarity);
BatchItemSimilarities batchItemSimilarities = new MultithreadedBatchItemSimilarities(recommender, 5);
SimilarItemsWriter writer = new FileSimilarItemsWriter(resultFile);
int numSimilarites = batchItemSimilarities.computeItemSimilarities(Runtime.getRuntime().availableProcessors(), 1, writer);
System.out.println("Computed "+ numSimilarites+ " for "+ dataModel.getNumItems()+" items and saved them to "+resultFile.getAbsolutePath());
}
}
(3)基于user的推荐 得到userRcomed.csv 文件
package com.dylan.MovieLens;
import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.eval.RecommenderBuilder;
import org.apache.mahout.cf.taste.impl.eval.RMSRecommenderEvaluator;
import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;
import org.apache.mahout.cf.taste.impl.recommender.CachingRecommender;
import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import org.apache.mahout.cf.taste.recommender.RecommendedItem;
import org.apache.mahout.cf.taste.recommender.Recommender;
import org.apache.mahout.cf.taste.similarity.UserSimilarity;
import java.io.File;
import java.io.IOException;
import java.io.PrintWriter;
import java.util.List;
public class UserRecommenderMovieLens {
private UserRecommenderMovieLens(){
}
public static void main(String[] args) throws Exception {
if (args.length != 1) {
System.err.println("Needs MovieLens 1M dataset as arugument!");
System.exit(-1);
}
File resultFile = new File(System.getProperty("java.io.tmpdir"), "userRcomed.csv");
DataModel dataModel = new MovieLensDataModel(new File(args[0]));
UserSimilarity similarity = new PearsonCorrelationSimilarity(dataModel);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, dataModel);
Recommender recommender = new GenericUserBasedRecommender(dataModel, neighborhood, similarity);
Recommender cachingRecommender = new CachingRecommender(recommender);
//Evaluate
RMSRecommenderEvaluator evaluator = new RMSRecommenderEvaluator();
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
@Override
public Recommender buildRecommender(DataModel dataModel) throws TasteException {
UserSimilarity similarity = new PearsonCorrelationSimilarity(dataModel);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, dataModel);
return new GenericUserBasedRecommender(dataModel, neighborhood, similarity);
}
};
double score = evaluator.evaluate(recommenderBuilder, null, dataModel, 0.9, 0.5);
System.out.println("RMSE score is "+score);
try(PrintWriter writer = new PrintWriter(resultFile)){
for (int userID=1; userID <= dataModel.getNumUsers(); userID++){
List<RecommendedItem> recommendedItems = cachingRecommender.recommend(userID, 2);
String line = userID+" : ";
for (RecommendedItem recommendedItem: recommendedItems){
line += recommendedItem.getItemID()+":"+recommendedItem.getValue()+",";
}
if (line.endsWith(",")){
line = line.substring(0, line.length()-1);
}
writer.write(line);
writer.write('\n');
}
} catch (IOException ioe){
resultFile.delete();
throw ioe;
}
System.out.println("Recommended for "+dataModel.getNumUsers()+" users and saved them to "+resultFile.getAbsolutePath());
}
}
实例7 常用开放数据集:Book-Crossing
1.内容
来自Book-Crossing图书社区,读者对书籍的评分
2.数据量(数据条数)
278858个用户对271379本书进行的评分,包括显式和隐式的评分
3.数据集下载
http://grouplens.org/datasets/book-crossing/
显示数据评分 1-10
隐式:点击,购买等
需求
使用BookCrossing数据集实现两种图书推荐系统
基于ratings推荐
无ratings推荐 (用布尔变量,点击为1,没点击为0)
步骤
实现BookCrossing数据集的DataModel
实现两套推荐系统
使用GenericBooleanPrefUserBasedRecommender
实现DataModelBuilder
(1)基于ratings推荐
数据处理:
package com.dylan.BookCrossing;
import org.apache.commons.io.Charsets;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.common.iterator.FileLineIterable;
import java.io.*;
import java.util.regex.Pattern;
public class BXDataModel extends FileDataModel {
//private static String COLON_DELIMITER="::";
private static Pattern NON_DIGIT_SEMICOLON_DELIMITER=Pattern.compile("[^0-9;]");
public BXDataModel(File ratingsFile, Boolean ignoreRatings) throws IOException{
super(convertFile(ratingsFile, ignoreRatings));
}
private static File convertFile(File orginalFile, Boolean ignoreRatings) throws IOException{
File resultFile = new File(System.getProperty("java.io.tmpdir"), "bookcrossing.csv");
if (resultFile.exists()){
resultFile.delete();
}
try(Writer writer = new OutputStreamWriter(new FileOutputStream(resultFile), Charsets.UTF_8)) {
for (String line: new FileLineIterable(orginalFile, true)){
if (line.endsWith("\"0\"")){
continue;
}
String convertedLine = NON_DIGIT_SEMICOLON_DELIMITER.matcher(line).replaceAll("").replace(';', ',');
if (convertedLine.contains(",,")){
continue;
}
if (ignoreRatings){
convertedLine = convertedLine.substring(0, convertedLine.lastIndexOf(','));
}
writer.write(convertedLine);
writer.write('\n');
}
} catch (IOException ioe){
resultFile.delete();
throw ioe;
}
return resultFile;
}
}
准备一个 Recommender
package com.dylan.BookCrossing;
import org.apache.mahout.cf.taste.common.Refreshable;
import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;
import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.EuclideanDistanceSimilarity;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import org.apache.mahout.cf.taste.recommender.IDRescorer;
import org.apache.mahout.cf.taste.recommender.RecommendedItem;
import org.apache.mahout.cf.taste.recommender.Recommender;
import org.apache.mahout.cf.taste.similarity.UserSimilarity;
import java.util.Collection;
import java.util.List;
public class BXRecommender implements Recommender{
private Recommender recommender;
public BXRecommender(DataModel dataModel) throws TasteException{
UserSimilarity similarity = new EuclideanDistanceSimilarity(dataModel);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, 0.2, similarity,dataModel, 0.2);
recommender = new GenericUserBasedRecommender(dataModel, neighborhood, similarity);
}
public List<RecommendedItem> recommend(long userID, int howMany) throws TasteException {
return recommender.recommend(userID, howMany, (IDRescorer) null, false);
}
public List<RecommendedItem> recommend(long userID, int howMany, boolean includeKnownItems) throws TasteException {
return recommender.recommend(userID, howMany, (IDRescorer) null, includeKnownItems);
}
public List<RecommendedItem> recommend(long userID, int howMany, IDRescorer rescorer) throws TasteException {
return recommender.recommend(userID, howMany, rescorer, false);
}
@Override
public List<RecommendedItem> recommend(long userID, int howMany, IDRescorer idRescorer, boolean includeKnownItems) throws TasteException {
return recommender.recommend(userID, howMany, (IDRescorer) null, includeKnownItems);
}
@Override
public float estimatePreference(long userID, long itemID) throws TasteException {
return recommender.estimatePreference(userID, itemID);
}
public void setPreference(long userID, long itemID, float value) throws TasteException {
recommender.setPreference(userID, itemID, value);
}
public void removePreference(long userID, long itemID) throws TasteException {
recommender.removePreference(userID, itemID);
}
public DataModel getDataModel() {
return recommender.getDataModel();
}
@Override
public void refresh(Collection<Refreshable> collection) {
recommender.refresh(collection);
}
}
实现RecommenderBuilder
package com.dylan.BookCrossing;
import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.eval.RecommenderBuilder;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.recommender.Recommender;
public class BXBooleanRecommenderBuilder implements RecommenderBuilder {
@Override
public Recommender buildRecommender(DataModel dataModel) throws TasteException {
return new BXBooleanRecommender(dataModel);
}
}
实现 evaluator
package com.dylan.BookCrossing;
import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.eval.IRStatistics;
import org.apache.mahout.cf.taste.eval.RecommenderEvaluator;
import org.apache.mahout.cf.taste.eval.RecommenderIRStatsEvaluator;
import org.apache.mahout.cf.taste.impl.eval.AverageAbsoluteDifferenceRecommenderEvaluator;
import org.apache.mahout.cf.taste.impl.eval.GenericRecommenderIRStatsEvaluator;
import org.apache.mahout.cf.taste.model.DataModel;
import java.io.File;
import java.io.IOException;
public class BXBooleanRecommenderEvaluator {
private BXBooleanRecommenderEvaluator(){
}
public static void main(String[] args) throws IOException, TasteException {
/*
DataModel dataModel = new BXDataModel(new File(args[0]), true);
RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator();
IRStatistics stats = evaluator.evaluate(new BXBooleanRecommenderBuilder(), new BXDataModelBuilder(), dataModel, null, 3, Double.NEGATIVE_INFINITY, 1.0);
System.out.println("Precision is "+stats.getPrecision()+"; Recall is "+stats.getRecall()+"; F1 is"+stats.getF1Measure());
*/
DataModel dataModel = new BXDataModel(new File(args[0]), true);
RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
double score = evaluator.evaluate(new BXBooleanRecommenderBuilder(), null, dataModel, 0.9, 0.3);
System.out.println("MAE score is "+score);
}
}
(2)无rating的数据
准备一个 Recommender
package com.dylan.BookCrossing;
import com.sun.tools.internal.xjc.reader.xmlschema.bindinfo.BIConversion;
import org.apache.mahout.cf.taste.common.Refreshable;
import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;
import org.apache.mahout.cf.taste.impl.neighborhood.ThresholdUserNeighborhood;
import org.apache.mahout.cf.taste.impl.recommender.GenericBooleanPrefUserBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.LogLikelihoodSimilarity;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import org.apache.mahout.cf.taste.recommender.IDRescorer;
import org.apache.mahout.cf.taste.recommender.RecommendedItem;
import org.apache.mahout.cf.taste.recommender.Recommender;
import org.apache.mahout.cf.taste.similarity.UserSimilarity;
import org.apache.mahout.cf.taste.impl.similarity.CachingUserSimilarity;
import java.util.Collection;
import java.util.List;
public class BXBooleanRecommender implements Recommender{
private Recommender recommender;
public BXBooleanRecommender(DataModel dataModel) throws TasteException{
UserSimilarity similarity = new CachingUserSimilarity(new LogLikelihoodSimilarity(dataModel), dataModel);
//UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, Double.NEGATIVE_INFINITY, similarity,dataModel, 1.0);
UserNeighborhood neighborhood = new ThresholdUserNeighborhood(0.5, similarity, dataModel, 1.0);
recommender = new GenericBooleanPrefUserBasedRecommender(dataModel, neighborhood, similarity);
}
public List<RecommendedItem> recommend(long userID, int howMany) throws TasteException {
return recommender.recommend(userID, howMany, (IDRescorer) null, false);
}
public List<RecommendedItem> recommend(long userID, int howMany, boolean includeKnownItems) throws TasteException {
return recommender.recommend(userID, howMany, (IDRescorer) null, includeKnownItems);
}
public List<RecommendedItem> recommend(long userID, int howMany, IDRescorer rescorer) throws TasteException {
return recommender.recommend(userID, howMany, rescorer, false);
}
@Override
public List<RecommendedItem> recommend(long userID, int howMany, IDRescorer idRescorer, boolean includeKnownItems) throws TasteException {
return recommender.recommend(userID, howMany, (IDRescorer) null, includeKnownItems);
}
@Override
public float estimatePreference(long userID, long itemID) throws TasteException {
return recommender.estimatePreference(userID, itemID);
}
public void setPreference(long userID, long itemID, float value) throws TasteException {
recommender.setPreference(userID, itemID, value);
}
public void removePreference(long userID, long itemID) throws TasteException {
recommender.removePreference(userID, itemID);
}
public DataModel getDataModel() {
return recommender.getDataModel();
}
@Override
public void refresh(Collection<Refreshable> collection) {
recommender.refresh(collection);
}
}
实现RecommenderBuilder
package com.dylan.BookCrossing;
import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.eval.RecommenderBuilder;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.recommender.Recommender;
public class BXRecommenderBuilder implements RecommenderBuilder {
@Override
public Recommender buildRecommender(DataModel dataModel) throws TasteException {
return new BXRecommender(dataModel);
}
}
实现 evaluator
package com.dylan.BookCrossing;
import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.eval.RecommenderEvaluator;
import org.apache.mahout.cf.taste.impl.eval.AverageAbsoluteDifferenceRecommenderEvaluator;
import org.apache.mahout.cf.taste.model.DataModel;
import java.io.File;
import java.io.IOException;
public class BXRecommenderEvaluator {
private BXRecommenderEvaluator(){
}
public static void main(String[] args) throws IOException, TasteException {
DataModel dataModel = new BXDataModel(new File(args[0]), false);
RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
double score = evaluator.evaluate(new BXRecommenderBuilder(), null, dataModel, 0.9, 0.3);
System.out.println("MAE score is "+score);
}
}
实现DataModelBuilder接口
package com.dylan.BookCrossing;
import org.apache.mahout.cf.taste.eval.DataModelBuilder;
import org.apache.mahout.cf.taste.impl.common.FastByIDMap;
import org.apache.mahout.cf.taste.impl.model.GenericBooleanPrefDataModel;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.model.PreferenceArray;
public class BXDataModelBuilder implements DataModelBuilder{
@Override
public DataModel buildDataModel(FastByIDMap<PreferenceArray> fastByIDMap) {
return new GenericBooleanPrefDataModel(GenericBooleanPrefDataModel.toDataMap(fastByIDMap));
}
}
优化 Recommender
1.
UserSimilarity similarity = new CachingUserSimilarity(new LogLikelihoodSimilarity(dataModel), dataModel);
2.
UserNeighborhood neighborhood = new ThresholdUserNeighborhood(0.5, similarity, dataModel, 1.0);
3.
评分标准的改变
Mahout 推荐 推荐
要求:基于MySQL中的电影评分的数据,使用Mahout为每个用户推荐3部电影
- 准备 准备 数据库表
(1)在mahout数据库中创建表:
use mahout;
CREATE TABLE taste_preferences (
user_id BIGINT NOT NULL,
item_id BIGINT NOT NULL,
preference FLOAT NOT NULL,
PRIMARY KEY (user_id, item_id),
INDEX (user_id),
INDEX (item_id)
);
并将 ratings.dat 前三列导入taste_preferences 表中。
LOAD DATA LOCAL INFILE
"/home/root/code/MahoutRecommendation/src/main/resources/ratings.
dat" INTO TABLE mahout.taste_preferences FIELDS TERMINATED BY
'::'(user_id,item_id,preference);
- 实现 实现 推荐 算法
使用MySQLJDBCDataModel
package com.dylan.practice;
import com.mysql.jdbc.jdbc2.optional.MysqlDataSource;
import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.impl.model.jdbc.MySQLJDBCDataModel;
import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;
import org.apache.mahout.cf.taste.impl.neighborhood.ThresholdUserNeighborhood;
import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.model.JDBCDataModel;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import org.apache.mahout.cf.taste.recommender.RecommendedItem;
import org.apache.mahout.cf.taste.recommender.Recommender;
import org.apache.mahout.cf.taste.similarity.UserSimilarity;
import java.io.File;
import java.io.IOException;
import java.io.PrintWriter;
import java.util.List;
public class MysqlDataMovieRecommend {
private MysqlDataMovieRecommend() throws TasteException, IOException {
}
public static void main(String[] args) throws TasteException, IOException {
File resultFile = new File("/tmp", "MysqlMovieRcomed.txt");
//Mysql Connection
MysqlDataSource mysqlDataSource = new MysqlDataSource();
mysqlDataSource.setDatabaseName("mahout");
mysqlDataSource.setServerName("127.0.0.1");
mysqlDataSource.setUser("mahout");
mysqlDataSource.setPassword("mahout");
mysqlDataSource.setAutoReconnect(true);
mysqlDataSource.setFailOverReadOnly(false);
JDBCDataModel dataModel = new MySQLJDBCDataModel(mysqlDataSource, "taste_preferences", "user_id", "item_id", "preference", null);
DataModel model = dataModel;
//Recommendations
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
//UserNeighborhood neighborhood = new ThresholdUserNeighborhood(0.5, similarity, model, 1.0);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(10, similarity, model);
Recommender recommender = new GenericUserBasedRecommender(model, neighborhood, similarity);
try (PrintWriter writer = new PrintWriter(resultFile)) {
for (int userID = 1; userID <= model.getNumUsers(); userID++) {
List<RecommendedItem> recommendedItems = recommender.recommend(userID, 3);
String line = userID + " : ";
for (RecommendedItem recommendedItem : recommendedItems) {
line += recommendedItem.getItemID() + ":" + recommendedItem.getValue() + ",";
}
if (line.endsWith(",")) {
line = line.substring(0, line.length() - 1);
}
writer.write(line);
writer.write('\n');
}
} catch (IOException ioe) {
resultFile.delete();
throw ioe;
}
System.out.println("Recommended for " + model.getNumUsers() + " users and saved them to " + resultFile.getAbsolutePath());
}
}
1.movie数据导入MySQL
2.同步MySQL和java IDE
File resultFile = new File("/tmp", "MysqlMovieRcomed.txt");
//Mysql Connection
MysqlDataSource mysqlDataSource = new MysqlDataSource();
mysqlDataSource.setDatabaseName("mahout");
mysqlDataSource.setServerName("127.0.0.1");
mysqlDataSource.setUser("mahout");
mysqlDataSource.setPassword("mahout");
mysqlDataSource.setAutoReconnect(true);
mysqlDataSource.setFailOverReadOnly(false);
3.生成JDBCDataModel
JDBCDataModel dataModel = new MySQLJDBCDataModel(mysqlDataSource, "taste_preferences", "user_id", "item_id", "preference", null);
DataModel model = dataModel;
4.recommender,生成文件
改进:
1.MySQL的配置文件
2.相似个数10
UserNeighborhood neighborhood = new NearestNUserNeighborhood(10, similarity, model);