python 年报自动下载 python自动下载论文

转载

mob64ca13ff9303 2023-11-27 19:04:57

文章标签 python 年报自动下载 python ci 文件路径 文章分类 Python 后端开发

1. scidownl下载与安装

2. 测试记录

3. scidownl使用

python 年报自动下载 python自动下载论文_文件路径

1. scidownl下载与安装

SciDownl

用于从SciHub下载论文的非官方api，维护者Tishacy。

支持用 DOI 或者 PMID进行下载。
易于更新最新的SciHub域名。

Install with pip

Scidownl could be easily install with pip.

$ pip3 install -U scidownl

Install from source code

$ git clone https://github.com/Tishacy/SciDownl.git
$ cd Scidownl && python3 setup.py install

2. 测试记录

2.1 克隆base环境，命名为scidownl

conda create --name scidownl --clone base #克隆主环境base并命名为pdf2doi

切换到scidownl虚拟环境，并执行

pip3 install -U scidownl

注意：如遇报错，请更新或降低相应依赖包的版本，或者重新执行安装命令。

3. scidownl使用

3.1 快速使用

# 用DOI下载，文件名是论文的标题。 $ scidownl download --doi https://doi.org/10.1145/3375633 # 使用PMID和用户定义的文件路径下载 $ scidownl download --pmid 31395057 --out ./paper/paper-1.pdf

3.1.1 命令行工具

$ scidownl -h Usage: scidownl [OPTIONS] COMMAND [ARGS]... 用于从Scihub下载PDF的命令行工具 Options: -h, --help Show this message and exit. Commands: config 获取全局配置。 domain.list 在本地数据库中列出可用的SciHub域。 domain.update 更新可用的SciHub域并将其保存到本地数据库。 download 通过DOI或PMID下载论文。

3.1.2 更新可用的SciHub域

$ scidownl domain.update --help
Usage: scidownl domain.update [OPTIONS]
更新可用的SciHub域并将其保存到本地数据库。 



您可以使用选项指定两种更新模式：-m或--mode
Options:
  -m, --mode TEXT  更新模式，可以是'crawl' or 'search'，默认模式为'crawl'。
  -h, --help       Show this message and exit.


crawl：[默认值] 爬取实时更新的SciHub域网站（又名SciHu域源），以获取可用的SciHub域。SciHub域源网站url在[sciHub.domain.updater.crawl]部分的全局配置文件中配置，密钥为SciHub_domain_source。您可以使用以下命令来显示全局配置文件的位置并对其进行编辑。
 
 scidownl config --location 
 

http://tool.yovisun.com/scihub # scidownl包默认使用的SciHub域源网站实则为科研通的网址。

使用crawl模式的例子
 
 $ scidownl domain.update --mode crawl(scidownl) PS C:\Users\sun78\Desktop\视频 python\临时> scidownl domain.update --mode crawl
[INFO] | 2022/08/24 20:55:09 | Found 6 valid SciHub domains in total: ['http://sci-hub.st', 'https://sci-hub.se', 'https://sci-hub.st', 'https://sci-hub.ru', 'http://sci-hub.ru', 'http://sci-hub.se']
[INFO] | 2022/08/24 20:55:09 | Saved 6 SciHub domains to local db. 
 
一共保存了6个可用的sci-hub域名。

search：根据SciHub域的规则生成组合，并搜索可用的SciHu域。这将比爬网模式花费更长的时间。
 
 $ scidownl domain.update --mode search(scidownl) PS C:\Users\sun78\Desktop\视频 python\临时> scidownl domain.update --mode search
[INFO] | 2022/08/24 21:02:30 | # Search valid SciHub domains from 1352 urls
[INFO] | 2022/08/24 21:05:01 | # Found a SciHub domain url: https://sci-hub.se
[INFO] | 2022/08/24 21:05:02 | # Found a SciHub domain url: https://sci-hub.st
[INFO] | 2022/08/24 21:05:04 | # Found a SciHub domain url: http://sci-hub.is
[INFO] | 2022/08/24 21:05:17 | # Found a SciHub domain url: https://sci-hub.is
[INFO] | 2022/08/24 21:06:02 | # Found a SciHub domain url: http://sci-hub.yt
[INFO] | 2022/08/24 21:06:03 | # Found a SciHub domain url: https://sci-hub.yt
[INFO] | 2022/08/24 21:06:08 | # Found a SciHub domain url: http://sci-hub.st
[INFO] | 2022/08/24 21:06:08 | # Found a SciHub domain url: http://sci-hub.se
[INFO] | 2022/08/24 21:06:08 | Found 8 valid SciHub domains in total: ['https://sci-hub.se', 'https://sci-hub.st', 'http://sci-hub.is', 'https://sci-hub.is', 'http://sci-hub.yt', 'https://sci-hub.yt', 'http://sci-hub.st', 'http://sci-hub.se']
[INFO] | 2022/08/24 21:06:08 | Saved 8 SciHub domains to local db.

3.2 列出所有保存的SciHub域

SciDownl使用SQLite作为本地数据库，在本地存储所有更新的SciHub域。您可以使用命令domain.list列出所有保存的SciHub域。
 
 $ scidownl domain.list(scidownl) PS C:\Users\sun78\Desktop\视频 python\临时> scidownl domain.list
+--------------------+----------------+---------------+
| Url                |   SuccessTimes |   FailedTimes |
|--------------------+----------------+---------------|
| https://sci-hub.se |              3 |             4 |
| http://sci-hub.se  |              0 |             4 |
| https://sci-hub.ru |              0 |             4 |
| https://sci-hub.st |              0 |             4 |
| http://sci-hub.ru  |              0 |             4 |
| http://sci-hub.st  |              0 |             4 |
| http://sci-hub.is  |              0 |             0 |
| https://sci-hub.is |              0 |             0 |
| http://sci-hub.yt  |              0 |             0 |
| https://sci-hub.yt |              0 |             0 | 
 
除了易于理解的Url列外，SuccessTimes列还用于记录使用此Url成功下载论文的次数，FailedTimes列用于记录使用该Url失败的纸张下载次数。这两列用于计算下载论文时选择SciHub域的优先级。

3.3 下载论文

$ scidownl download --help
Usage: scidownl download [OPTIONS]

  Download paper(s) by DOI or PMID.

Options:
  -d, --doi TEXT         DOI string. 支持指定多个DOI,
                         e.g., --doi FIRST_DOI --doi SECOND_DOI ...
  -p, --pmid INTEGER     PMID numbers. 支持指定多个PMIDs,
                         e.g., --pmid FIRST_PMID --pmid SECOND_PMID ...
  -o, --out TEXT         输出目录或文件路径，可以是绝对路径或相对路径。输出目录
                         例子: /absolute/path/to/download/,
                         ./relative/path/to/download/, Output file examples:
                         /absolute/dir/paper.pdf, ../relative/dir/paper.pdf.
如果未指定--out，则论文将以论文标题的文件名下载到当前目录。如果提供了多个DOI或多个PMID，则始终将--out选项视为输出目录，而不是输出文件路径。

  -u, --scihub-url TEXT  Scihub域url。如果未指定，则自动从本地保存的域中选择一个。建议将此选项保留为空。
  -h, --help             Show this message and exit.


下载带有DOI或PMID的论文

使用选项-d或--doi下载带有doi的论文，使用选项-p或--pmid下载带有pmid的论文。您可以多次指定这些选项，甚至可以混合使用。# with a single DOI
$ scidownl download --doi https://doi.org/10.1145/3375633

# with multiple DOIs
$ scidownl download --doi https://doi.org/10.1145/3375633 --doi https://doi.org/10.1145/2785956.2787496

# with a single PMID
$ scidownl download --pmid 31395057

# with multiple PMIDs
$ scidownl download --pmid 31395057 --pmid 24686414

# with a mix of DOIs and PMIDs
$ scidownl download --doi https://doi.org/10.1145/3375633 --pmid 31395057 --pmid 24686414

自定义论文的输出位置 默认情况下，下载的论文以论文标题命名。使用选项-o或-out，您可以自定义下载文件的输出位置，其中可以是绝对路径或相对路径，也可以是目录路径或文件路径。

将paepr输出到目录: $ scidownl download --pmid 31395057 --out /absolute/path/of/a/directory/ # NOTE that the '/' at the end of the directory path is required, otherwise the last segment will be treated as the filename rather than a directory. $ scidownl download --pmid 31395057 --out ../relative/path/of/a/directory/ # The '/' at the end of the directory path is required too.
使用文件路径输出论文. $ scidownl download --pmid 31395057 --out /absolute/dir/paper.pdf $ scidownl download --pmid 31395057 --out ../relative/dir/paper.pdf $ scidownl download --pmid 31395057 --out relative/dir/paper.pdf $ scidownl download --pmid 31395057 --out paper # will be downlaoded as ./paper.pdf
请注意，如果要下载多篇论文，则--out选项的值将始终被视为目录，而不是文件路径。

$ scidownl download --pmid 31395057 --pmid 24686414 --out paper # will be downloaded to ./paper/ directory: # ./paper/<paper-title-1>.pdf # ./paper/<paper-title-2>.pdf

如果选项中的某些目录不存在，SciDownl将为您创建它们：smile:。

使用特定的SciHub url
使用选项-u 或 --scihub-url，您可以使用所需的特定scihub url，而不是让SciDownl从本地保存的SciHu域中自动为您选择一个。建议让SciDownl选择一个SciHub url，因此在正常使用中不需要使用此选项。

$ scidownl download --pmid 31395057 --scihub-url http://sci-hub.se

使用模块

Module use
You could use scihub_download function to download papers.

from scidownl import scihub_download

paper = "https://doi.org/10.1145/3375633"
paper_type = "doi"
out = "./paper/one_paper.pdf"
scihub_download(paper, paper_type=paper_type, out=out)

安装路径下有一个叫example的文件夹，里面有一个 simple.py 例子，如下

# -*- coding: utf-8 -*-

from scidownl import scihub_download


def download_one_paper():
    """Example of downloading one paper.
    The paper will be downloaded the ./paper/ directory, and
    the filename is one_paper.pdf
    """
    paper = "https://doi.org/10.1145/3375633"
    paper_type = "doi"
    out = "./paper/one_paper.pdf"
    scihub_download(paper, paper_type=paper_type, out=out)


def download_multi_papers():
    """Example of downloading multiple papers.
    All papers will be downloaded to the ./paper/ directory,
    and their filenames are the paper titles.
    """
    source = [
        ("https://doi.org/10.1145/3375633", 'doi', "./paper/"),
        ("31395057", 'pmid', "./paper/"),
        ("24686414", 'pmid', "./paper/"),
    ]
    for paper, paper_type, out in source:
        scihub_download(paper, paper_type=paper_type, out=out)


if __name__ == '__main__':
    download_one_paper()
    download_multi_papers()

参考资料

scidownl · PyPI

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：高斯数据库是否完全兼容MySQL 高斯数据库客户端

下一篇：javascript鼠标静置时滚动条不消失怎么设置 js鼠标移动到指定位置

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯

python 年报自动下载 python自动下载论文

python 年报自动下载 python自动下载论文

1. scidownl下载与安装

SciDownl

Install with pip

Install from source code

2. 测试记录

3. scidownl使用

3.1 快速使用

3.2 列出所有保存的SciHub域

3.3 下载论文

Module use

51CTO博客