之前和女朋友在微信上玩明星脸小程序,发现大多小程序的分析都不太准,偶尔有几个准的还收费,正好之前学过人脸识别,想着原理应该大同小异,就决定自己搭建一个明星脸程序。
github项目地址:https:///JiageWang/starface
1. 数据收集
要寻找最相似明星脸,首先得有数据,因为现成的数据集大多过时了,缺少很多当红的明星,因此决定自己去网上爬取,找了很多网站最终决定对明星网下手,写了个简单的脚本如下,主要使用的requests
与xpath
技术
import os
import time
import random
import requests
from lxml import etree
first_num = random.randint(55, 62)
third_num = random.randint(0, 3200)
fourth_num = random.randint(0, 140)
os_type = [
'(Windows NT 6.1; WOW64)', '(Windows NT 10.0; WOW64)', '(X11; Linux x86_64)',
'(Macintosh; Intel Mac OS X 10_12_6)'
]
chrome_version = 'Chrome/{}.0.{}.{}'.format(first_num, third_num, fourth_num)
def get_ua():
global os_type, chrome_version
return ' '.join(['Mozilla/5.0', random.choice(os_type), 'AppleWebKit/537.36',
'(KHTML, like Gecko)', chrome_version, 'Safari/537.36']
)
headers_index = {
"Host": "www.mingxing.com",
}
headers_img = {
"Referer": "http://www.mingxing.com/ziliao/index.html",
}
root = 'starImages'
if not os.path.exists(root):
os.makedirs(root)
s = requests.session()
for i in range(1, 194):
url = r'http://www.mingxing.com/ziliao/index?&p={}'.format(i)
headers_index['User-Agent'] = get_ua()
response = s.get(url, headers=headers_index)
html = etree.HTML(response.text)
lis = html.xpath("//div[@class='page_starlist']//li")
time.sleep(1)
for li in lis:
src = li.xpath(".//img/@src")[0]
name = li.xpath(".//a/h3")[0].text.strip()
print('Downloading {}'.format(name))
headers_img['Referer'] = url
headers_img['User-Agent'] = get_ua()
img = s.get(src, headers=headers_img)
folder = os.path.join(root, name)
if not os.path.exists(folder):
os.mkdir(folder)
file = os.path.join(root, name, '{}.jpg'.format(name))
with open(file, 'ab') as f:
f.write(img.content)
# time.sleep(0.2)
img.close()
爬取结果,每个明星都建立一个文件夹,每个文件夹里有同名图片文件
2. 建立明星脸数据库
使用mtcnn
与arcface
对每一位明星的脸部进行提取并计算嵌入向量,通过分析向量之间的欧式距离即可判断相似度,主要代码如下,该函数将明星名字与人脸向量保存为本地文件facebank.pkl
方便后续使用,具体涉及很多深度学习代码见原项目。
def get_facebank(path):
names = []
embeddings = []
folders = os.listdir(path)
with open("starlist.txt", 'w', encoding='utf-8') as f:
for name in tqdm(folders): # 迭代文件夹
file = os.path.join(path, name, name + '.jpg') # 获取图片文件名
starimg = cv2.imdecode(np.fromfile(file, dtype=np.uint8), cv2.IMREAD_COLOR)
if starimg is None: continue
_, _, starface, embedding = face_model(starimg) # 获取嵌入向量
if len(embedding) != 1: continue
f.write(name + '\n')
names.append(name)
embeddings.append(embedding[0])
with open('facebank.pkl', 'wb') as f:
pickle.dump((names, embeddings), f) # 保存为本地文件方便使用
return names, embeddings
3. 获取人脸对比数据库分析相似度
读取自己的照片,获取人脸与嵌入向量,与数据库的每一条向量进行比较,获取距离最近的向量即为最相似的人脸,获取两张图片的人脸进行对齐并展示。
def compare_embedding(embedding, facebank):
"""单个人脸向量与数据库对比"""
if len(facebank) == 0:
return None
embedding = np.array(embedding)
facebank = np.array(facebank).squeeze(axis=1)
diff = embedding - facebank
dist = np.sum(np.square(diff), axis=1)
min_idx = np.argmin(dist) # 最小欧氏距离
return min_idx
def find_similar_star(path, datapath='starImages'):
myimg = cv2.imdecode(np.fromfile(path, dtype=np.uint8), cv2.IMREAD_COLOR) # -1表示cv2.IMREAD_UNCHANGED
_, _, myface, embedding = face_model(myimg)
if len(myface) != 1:
return None, None
# 寻找最相似明星
idx = compare_embedding(embedding[0], face_bank)
starnname = names[idx]
# 获取明星脸
file = datapath + '\{}\{}.jpg'.format(starnname, starnname)
starimg = cv2.imdecode(np.fromfile(file, dtype=np.uint8), cv2.IMREAD_COLOR) # -1表示cv2.IMREAD_UNCHANGED
_, _, starface, _ = face_model(starimg)
# 人脸对比
myface = myface[0]
starface = starface[0]
h, w = starface.shape[:2]
myface = cv2.resize(myface, (w, h))
# myface = cv2.GaussianBlur(myface, ksize=(15,15), sigmaX=0)
result = np.hstack((myface, starface))
return starnname, result
用我自己的图片测试结果如下
使用明星本人的图片测试结果如下
总结
可以看到arcface的人脸识别还是挺准的,相比那些小程序效果好的多而且最重要的是免费!!!
参考