目录结构
搭建Python环境并Python实现HTML文档转Word
- 文章快速说明索引
- Linux 安装 Python
- 实现HTML转Word
文章快速说明索引
学习目标:
因为最近需要整理一批HTML文档到Word,鉴于网上的相关转换方法实在是解决不了太大问题。后来想到Python下可以直接调用相关方法来完成转换工作。并也趁此机会把自己Python的学习环境转移到Linux环境下,于是这篇博客将从头开始搭建Python环境和Python实现HTML文档转Word方式。
学习内容:(详见目录)
1、安装Python
2、安装pip以及setuptools工具
3、Python实现html转word
学习时间:
2020年9月15日04:45:20
学习产出:
1、Python基础学习环境搭建(Linux)
2、CSDN 技术博客 1篇
3、Python实现html转word方式
Linux 安装 Python
在开始之前我们先检查一下我们的Python环境是否存在:
[postgres@local64 ~]$ python -V
Python 2.7.5
[postgres@local64 ~]$
这里我们最好不用动现有的python2环境,直接忽略它就可以了!
下面是Python环境的官网:Index of /ftp/python/
这里我们选择一个稳定且较新的Python3
版本 3.8.0,其下载链接就是:Python-3.8.0.tgz
第一步:下载python3.8
安装包到目录
wget --no-check-certificate https://www.python.org/ftp/python/3.8.0/Python-3.8.0.tgz
注:如果很慢的话,建议:使用迅雷打开这个链接 3秒钟下载完成,然后传送到虚拟机里面。
第二步:解压并开始安装
解压:
tar -zxvf Python-3.8.0.tgz
配置参数并生成Makefile:
./configure --prefix=/home/postgres/python/pythonLocation --enable-shared
[postgres@local64 Python-3.8.0]$ pwd
/home/postgres/python/Python-3.8.0
[postgres@local64 Python-3.8.0]$
上面是自己喜欢的安装目录,接下来编译安装:
make && make install
去bin目录下制作软连接:
...
Looking in links: /tmp/tmpv355fn2c
Collecting setuptools
Collecting pip
Installing collected packages: setuptools, pip
Successfully installed pip-19.2.3 setuptools-41.2.0
[postgres@local64 Python-3.8.0]$
[postgres@local64 Python-3.8.0]$ pwd
/home/postgres/python/Python-3.8.0
[postgres@local64 Python-3.8.0]$
[postgres@local64 Python-3.8.0]$ cd ..
[postgres@local64 python]$ ls
Python-3.8.0 Python-3.8.0.tgz pythonLocation
[postgres@local64 python]$
[postgres@local64 python]$ cd pythonLocation/
[postgres@local64 pythonLocation]$ ls
bin include lib share
[postgres@local64 pythonLocation]$ cd bin/
[postgres@local64 bin]$ pwd
/home/postgres/python/pythonLocation/bin
[postgres@local64 bin]$
[postgres@local64 bin]$ ls
2to3 2to3-3.8 easy_install-3.8 idle3 idle3.8 pip3 pip3.8 pydoc3 pydoc3.8 python3 python3.8 python3.8-config python3-config
[postgres@local64 bin]$ sudo ln -s /home/postgres/python/pythonLocation/bin/python3.8 /usr/bin/python3
[sudo] password for postgres:
[postgres@local64 bin]$
去lib目录下面拷贝相关库文件:
[postgres@local64 bin]$ cd ../lib/
[postgres@local64 lib]$
[postgres@local64 lib]$ pwd
/home/postgres/python/pythonLocation/lib
[postgres@local64 lib]$
[postgres@local64 lib]$ ls
.1.0 pkgconfig python3.8
[postgres@local64 lib]$
[postgres@local64 lib]$ sudo cp -r ./* /usr/lib64/
[postgres@local64 lib]$
下面来验证一下:(最好切换一个别的无关目录)
[postgres@local64 ~]$ python -V
Python 2.7.5
[postgres@local64 ~]$
[postgres@local64 ~]$ python3 -V
Python 3.8.0
[postgres@local64 ~]$
第三步:pip以及setuptools
工具的安装
[postgres@local64 bin]$ pip install --upgrade setuptools
Requirement already satisfied: setuptools in /home/postgres/python/pythonLocation/lib/python3.8/site-packages (41.2.0)
Collecting setuptools
Using cached setuptools-62.3.2-py3-none-any.whl (1.2 MB)
Installing collected packages: setuptools
Attempting uninstall: setuptools
Found existing installation: setuptools 41.2.0
Uninstalling setuptools-41.2.0:
Successfully uninstalled setuptools-41.2.0
Successfully installed setuptools-62.3.2
[postgres@local64 bin]$
注:我在上面编译安装的时候,发现已经生成了pip工具。在安装目录下的bin里面:
[postgres@local64 bin]$ ls
2to3 2to3-3.8 easy_install-3.8 idle3 idle3.8 pip3 pip3.8 pydoc3 pydoc3.8 python3 python3.8 python3.8-config python3-config
[postgres@local64 bin]$
[postgres@local64 bin]$ pwd
/home/postgres/python/pythonLocation/bin
[postgres@local64 bin]$
[postgres@local64 bin]$ ./pip3 -V
pip 19.2.3 from /home/postgres/python/pythonLocation/lib/python3.8/site-packages/pip (python 3.8)
[postgres@local64 bin]$
[postgres@local64 bin]$ ./pip3.8 -V
pip 19.2.3 from /home/postgres/python/pythonLocation/lib/python3.8/site-packages/pip (python 3.8)
[postgres@local64 bin]$
OK,还是和上面一下,制作一个软连接:
[postgres@local64 bin]$ ls
2to3 2to3-3.8 easy_install-3.8 idle3 idle3.8 pip3 pip3.8 pydoc3 pydoc3.8 python3 python3.8 python3.8-config python3-config
[postgres@local64 bin]$
[postgres@local64 bin]$ pip -V
bash: pip: command not found...
[postgres@local64 bin]$
[postgres@local64 bin]$ pwd
/home/postgres/python/pythonLocation/bin
[postgres@local64 bin]$
[postgres@local64 bin]$ sudo ln -s /home/postgres/python/pythonLocation/bin/pip3.8 /usr/bin/pip
[postgres@local64 bin]$
[postgres@local64 bin]$ pip -V
pip 19.2.3 from /home/postgres/python/pythonLocation/lib/python3.8/site-packages/pip (python 3.8)
[postgres@local64 bin]$
下面我们通过安装一下第三方库来验证一下其有效性:(毕竟丰富的第三方库是python的优势所在,为了更加方便的安装第三方库 我们 需要pip)
OK,下面看一下:
[postgres@local64 Python-3.8.0]$ pip list
Package Version
------------ -------
bcrypt 3.2.0
cffi 1.14.2
cryptography 3.1
paramiko 2.7.2
pip 19.2.3
pycparser 2.20
PyNaCl 1.4.0
setuptools 41.2.0
six 1.15.0
WARNING: You are using pip version 19.2.3, however version 20.2.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
[postgres@local64 Python-3.8.0]$
实现HTML转Word
安装相关模块工具:
sudo yum install pandoc
pip install pypandoc
[postgres@local64 Python-3.8.0]$ sudo yum install pandoc
Loaded plugins: fastestmirror, langpacks
Loading mirror speeds from cached hostfile
* base: mirrors.aliyun.com
* centos-sclo-rh: mirrors.aliyun.com
* centos-sclo-sclo: mirrors.aliyun.com
* epel: mirrors.bfsu.edu.cn
* extras: mirrors.cqu.edu.cn
* updates: mirrors.cqu.edu.cn
Package pandoc-1.12.3.1-2.el7.x86_64 already installed and latest version
Nothing to do
[postgres@local64 Python-3.8.0]$ pip3 install pypandoc
bash: pip3: command not found...
Similar command is: 'pip'
[postgres@local64 Python-3.8.0]$ pip install pypandoc
Collecting pypandoc
Downloading https://files.pythonhosted.org/packages/d6/b7/5050dc1769c8a93d3ec7c4bd55be161991c94b8b235f88bf7c764449e708/pypandoc-1.5.tar.gz
Requirement already satisfied: setuptools in /home/postgres/python/pythonLocation/lib/python3.8/site-packages (from pypandoc) (41.2.0)
Requirement already satisfied: pip>=8.1.0 in /home/postgres/python/pythonLocation/lib/python3.8/site-packages (from pypandoc) (19.2.3)
Collecting wheel>=0.25.0 (from pypandoc)
Downloading https://files.pythonhosted.org/packages/a7/00/3df031b3ecd5444d572141321537080b40c1c25e1caa3d86cdd12e5e919c/wheel-0.35.1-py2.py3-none-any.whl
Installing collected packages: wheel, pypandoc
Running setup.py install for pypandoc ... done
Successfully installed pypandoc-1.5 wheel-0.35.1
WARNING: You are using pip version 19.2.3, however version 20.2.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
[postgres@local64 Python-3.8.0]$ pip list
Package Version
------------ -------
bcrypt 3.2.0
cffi 1.14.2
cryptography 3.1
paramiko 2.7.2
pip 19.2.3
pycparser 2.20
PyNaCl 1.4.0
pypandoc 1.5
setuptools 41.2.0
six 1.15.0
wheel 0.35.1
WARNING: You are using pip version 19.2.3, however version 20.2.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
[postgres@local64 Python-3.8.0]$
开始我们的转换:
import pypandoc
output = pypandoc.convert_file('1.html', 'docx', outputfile="1.docx")
结果如下: