python 有道翻译中文 python爬取有道翻译

转载

互联网小墨风 2024-07-05 07:49:08

文章标签 python 有道翻译中文 python 爬虫数据 json 文章分类 Python 后端开发

这里我们使用python的urllib来实现

首先，我们需要找到我们进行翻译时上传给服务器的数据。

python 有道翻译中文 python爬取有道翻译_python 有道翻译中文

我们可以通过查找审查元素中的Network这一栏目下，选择执行Post方法的选项。

python 有道翻译中文 python爬取有道翻译_json_02

在General下的Request URL就是我们访问的链接

url = 'http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule'

python 有道翻译中文 python爬取有道翻译_json_03

而Form Data下的就是我们所提交的数据，其中的i:就是我们提交的翻译内容，而这段数据，是通过字典的形式来存储的，所以我们只需要修改这个i所对应的值即可提交我们想要的翻译内容。

data = {
        'i' : target,
        'from' : 'AUTO',
        'to' : 'AUTO',
        'smartresult' : 'dict',
        'client' : 'fanyideskweb',
        'salt' : '15810537039389',
        'sign' : '157b38258a2253c7899895880487edfd',
        'ts' : '1581053703938',
        'bv' : '901200199a98c590144a961dac532964',
        'doctype' : 'json',
        'version' : '2.1',
        'keyfrom' : 'fanyi.web',
        'action' : 'FY_BY_CLICKBUTTION',    
    }

但是我们并不能这样直接的提交数据，我们还需要利用到urllib中的parse模块来将我们的数据以utf-8的形式进行编码

data = urllib.parse.urlencode(data).encode('utf-8')

当我们上传了数据后，还需要读取返回的数据,同时再将返回数据通过utf-8的形式进行解码

rep = urllib.request.Request(url, data,)
    response = urllib.request.urlopen(rep)
    html = response.read().decode('utf-8')

python 有道翻译中文 python爬取有道翻译_爬虫_04

我们发现返回来的数据其实是一个json格式的数据，也就是通过字符串来封装的python的数据结构，所以我们需要导入这个json

result = json.loads(html)

python 有道翻译中文 python爬取有道翻译_json_05

这就是导入后的数据，就是返回的字典，我们可以通过‘translateResult’这关键词来访问翻译结果

python 有道翻译中文 python爬取有道翻译_python 有道翻译中文_06

因为外面还封装了两层列表，所以我们需要进入这两层列表后再通过tgt这个关键字即可找到翻译结果，当然，这里使用正则表达式来查找会更加简洁

result = result['translateResult'][0][0]['tgt']

最后

python 有道翻译中文 python爬取有道翻译_爬虫_07

完整代码

import urllib.request
import urllib.parse
import json
import time
while True:
    target = input("请输入需要翻译的内容：")
    url = 'http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule'
    data = {
        'i' : target,
        'from' : 'AUTO',
        'to' : 'AUTO',
        'smartresult' : 'dict',
        'client' : 'fanyideskweb',
        'salt' : '15810537039389',
        'sign' : '157b38258a2253c7899895880487edfd',
        'ts' : '1581053703938',
        'bv' : '901200199a98c590144a961dac532964',
        'doctype' : 'json',
        'version' : '2.1',
        'keyfrom' : 'fanyi.web',
        'action' : 'FY_BY_CLICKBUTTION',    
    }

    head = {'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36'}

    data = urllib.parse.urlencode(data).encode('utf-8')

    rep = urllib.request.Request(url, data, head)
    response = urllib.request.urlopen(rep)

    html = response.read().decode('utf-8')
    result = json.loads(html)
    result = result['translateResult'][0][0]['tgt']

    print("翻译结果为:",result)

上面的这个代码其实还有很多缺点，因为当我们多次访问后可能会被服务器发现访问者其实只是一段代码，所以我们需要通过添加user agent来让服务器认为是浏览器访问，并且还需要添加代理来防止在同一ip下多次访问后被ban，这些内容我都会在后面的博客中写出

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。