如果客户买了 xx 物品,那么他可能买YY物品
规则常用的方法,支持度和置信度
支持度是指规则的应验次数
置信度就是应验次数所占的比例
直接上代码
# 面包,牛奶,奶酪,苹果,香蕉
from collections import OrderedDict
import numpy as np
from pyexcel_xls import get_data
from pyexcel_xls import save_data
xls_data = get_data(r"777.xls")
features = ["bread", "milk", "cheese", "apples", "bananas"]
# print (xls_data['Sheet1'])
lis =xls_data['Sheet1']
X= np.array(lis)
n_samples,n_features=X.shape # 获取行数
print(n_samples)
print(n_features)
# print(X)
# 统计买苹果的人数
num_apple_purchaes =0
for sample in X:
if sample[3]==1:
num_apple_purchaes +=1
print("{0} people bought Apples".format(num_apple_purchaes))
from collections import defaultdict
valid_rules =defaultdict(int) # 接受应验次数
invalid_rules =defaultdict(int) # 接受不应验次数
num_occurences =defaultdict(int) # 接受出现次数
for sample in X: #对每一行进行循环
for premise in range(n_features): #对每列进行循环
if sample[premise] == 0: continue #判断该行的某一列列元素是否位0,即是否购买,若为0,跳出本轮循环,测试下一列
num_occurences[premise] += 1 #记录有购买的一列 sample[premise]
for conclusion in range(n_features): #当读取到某一列有购买后,再次循环每一列的值
if premise == conclusion: #排除相同的一列,若循环到同一列,则跳出循环,比较下一列
continue
if sample[conclusion] == 1: #当sample[conclusion] 的值为1时,满足了当顾客购买前一件商品时也买了这种商品
valid_rules[(premise, conclusion)] += 1 #记录下该规则出现的次数
else:
invalid_rules[(premise, conclusion)] += 1 #当不满足时即 sample[conclusion]=0 时,记录下不满足该规则的次数
support = valid_rules #支持度=规则出现的次数
confidence = defaultdict(float) #强制将置信度转为浮点型
for premise, conclusion in valid_rules.keys():
confidence[(premise, conclusion)] = valid_rules[(premise, conclusion)] / num_occurences[premise] #计算某一规则的置信度,并将其存在字典confidence中
for premise, conclusion in confidence: #根据字典的两个参数来取值
premise_name = features[premise] #我们之前定义了features列表,它的每一列都对应数组的每一列,即商品名称
conclusion_name = features[conclusion] #商品名称
print("Rule: 如果顾客购买 {0} 那么他可能同时购买 {1}".format(premise_name, conclusion_name))
print(" - Confidence: {0:.3f}".format(confidence[(premise, conclusion)]))
print(" - Support: {0}".format(support[(premise, conclusion)]))
print("")
结果: 通过 置信度和支持度即可 知道 当买了什么时候,客户更喜欢在买什么
25
5
18 people bought Apples
Rule: 如果顾客购买 bread 那么他可能同时购买 milk
- Confidence: 0.533
- Support: 8
Rule: 如果顾客购买 milk 那么他可能同时购买 cheese
- Confidence: 0.222
- Support: 2
Rule: 如果顾客购买 apples 那么他可能同时购买 cheese
- Confidence: 0.333
- Support: 6
Rule: 如果顾客购买 milk 那么他可能同时购买 apples
- Confidence: 0.444
- Support: 4
Rule: 如果顾客购买 bread 那么他可能同时购买 apples
- Confidence: 0.667
- Support: 10
Rule: 如果顾客购买 apples 那么他可能同时购买 bread
- Confidence: 0.556
- Support: 10
Rule: 如果顾客购买 apples 那么他可能同时购买 bananas
- Confidence: 0.611
- Support: 11
Rule: 如果顾客购买 apples 那么他可能同时购买 milk
- Confidence: 0.222
- Support: 4
Rule: 如果顾客购买 milk 那么他可能同时购买 bananas
- Confidence: 0.556
- Support: 5
Rule: 如果顾客购买 cheese 那么他可能同时购买 bananas
- Confidence: 0.556
- Support: 5
Rule: 如果顾客购买 cheese 那么他可能同时购买 bread
- Confidence: 0.556
- Support: 5
Rule: 如果顾客购买 cheese 那么他可能同时购买 apples
- Confidence: 0.667
- Support: 6
Rule: 如果顾客购买 cheese 那么他可能同时购买 milk
- Confidence: 0.222
- Support: 2
Rule: 如果顾客购买 bananas 那么他可能同时购买 apples
- Confidence: 0.647
- Support: 11
Rule: 如果顾客购买 bread 那么他可能同时购买 bananas
- Confidence: 0.467
- Support: 7
Rule: 如果顾客购买 bananas 那么他可能同时购买 cheese
- Confidence: 0.294
- Support: 5
Rule: 如果顾客购买 milk 那么他可能同时购买 bread
- Confidence: 0.889
- Support: 8
Rule: 如果顾客购买 bananas 那么他可能同时购买 milk
- Confidence: 0.294
- Support: 5
Rule: 如果顾客购买 bread 那么他可能同时购买 cheese
- Confidence: 0.333
- Support: 5
Rule: 如果顾客购买 bananas 那么他可能同时购买 bread
- Confidence: 0.412
- Support: 7
最后按照置信度排序
年与时驰,意与日去,遂成枯落, 多不接世,悲守穷庐,将复何及。