pytesseract识别验证码

0.下载

​https://tesseract-ocr.github.io/tessdoc/Installation.html​

`

1.安装

pytesseract识别验证码_html

选择添加Math 和Chinese 包

安装完成后,配置环境变量,之前我安装1201版本的会报错,

win10 64位下会报这种错误:

pytesseract识别验证码_python_02

pytesseract识别验证码_python_03

所以我选择20190623的

安装后配置环境变量。

pytesseract识别验证码_爬虫_04

2.测试

pytesseract识别验证码_html_05

pytesseract识别验证码_python_06

识别结果

pytesseract识别验证码_html_07

3.使用pytesseract

安装

pip install pytesseract

找到安装路径下的​​pytesseract.py​

修改为自己OCR的路径

pytesseract识别验证码_python_08

​https://www.cnblogs.com/xiao-apple36/p/8865387.html#_label3_3​

from pytesseract import image_to_string
from PIL import Image


def depoint(image): # 像素 判断一个点周围情况 4,8邻域
"""
降噪
:param image:
:return:
"""
pixdata = image.load()
print(pixdata)
w, h = image.size
for y in range(1, h - 1):
for x in range(1, w - 1):
count = 0
if pixdata[x, y - 1] > 245:
count += 1
if pixdata[x, y + 1] > 245:
count += 1
if pixdata[x - 1, y] > 245:
count += 1
if pixdata[x + 1, y] > 245:
count += 1
if count > 3:
pixdata[x, y] = 255
return image


def binaring(image, threshold=160):
"""
对传入的图像进行灰度,二值化处理
:param image:
:param threshold:
:return:
"""
image = image.convert('L')
image.show()
pixdata = image.load()
# print(pixdata)
w, h = image.size
for y in range(h):
for x in range(w):
# print(pixdata[x,y])
if pixdata[x, y] < threshold:
pixdata[x, y] = 0
else:
pixdata[x, y] = 255
return image


if __name__ == '__main__':
# image = Image.open('img.png')
# # pix_l = []
# # image.show()
# # pix_l_set = sorted(list(set(pix_l)))
# # print(pix_l_set[:len(pix_l_set)//2]) # 求平均数的值
# image2 = binaring(image) # 二值化
# image3 = depoint(image2) # 降噪
# image3.show()
# # 识别文字
# print('code: ', image_to_string(image3, lang='eng'))
image = Image.open('test.png')
print(image_to_string(image, lang='chi_sim'))

pytesseract识别验证码_python_09