python3读取word python 读取 word

转载

冷月星 2023-08-30 10:17:52

文章标签 python3读取word python 开发语言数据取对象 文章分类 Python 后端开发

python-docx库读写word文档

一、简介

python-docx是利用python来读写word文件的第三方库。

开源地址是：https://github.com/python-openxml/python-docx
官方教程：https://python-docx.readthedocs.io/en/latest/
安装：pip install python-docx

python-docx库读写word文档主要涉及三个结构对象，层层递进。

`Document`：文档对象

from docx import Document
doc = Document('./test1.docx')   # 打开当前路径下的已有文档
doc = Document()      # 新建一个空白文档

`Paragraph`：段落对象

文档中的每一段内容，以换行符结尾。

`Run`：文字块对象

每个Paragraph中的不同部分，叫Run。注意：颜色、字体、粗细、斜体不同，就是不同的文字块。

python3读取word python 读取 word_python

二、读取word文档内容

利用python-docx库来读取现有的word文档数据，思路是先逐层获取对象，再提取相应对象的text属性。

1.读取纯文字docx文档。

python3读取word python 读取 word_数据_02

for paragraph in doc.paragraphs:
    print(f'paragraph.text = {paragraph.text}')
    for run in paragraph.runs:
        print(f'\trun.text = {run.text}')

# output
paragraph.text = 你好，这是第一个测试python-docx库的文档。
	run.text = 你好，这是第一个测试python-docx库的文档。
paragraph.text = 这是第二段落
	run.text = 这是第二段落
paragraph.text = 这是第三段落，粗体、红色
	run.text = 这是第三段落，
	run.text = 粗体
	run.text = 、
	run.text = 红色

小结：

逐级别提取对象：doc.paragraphs、paragraph.runs，获取的对象列表可迭代。
提取对象文本：paragraph.text、run.text

2.读取表格

利用python-docx库来读取现有的word文档数据，思路是先获取表格对象，再利用行列序号获取cell对象，最后提取相应对象的text属性。如table.cell(i,j).text

python3读取word python 读取 word_取对象_03

doc2 = Document('./test2_table.docx')

for table in doc2.tables:
    print("表格======{table}")
    for i in range(len(table.rows)):
        for j in range(len(table.columns)):
            print (f"{i}行{j}列：数据：{table.cell(i,j).text}")

# output
表格======{table}
0行0列：数据：日期
0行1列：数据：最高温
0行2列：数据：最低温
0行3列：数据：天气
0行4列：数据：风力风向
0行5列：数据：空气质量指数
1行0列：数据：2021-12-01 周三
1行1列：数据：9°
...
5行3列：数据：多云
5行4列：数据：东南风2级
5行5列：数据：44 优
表格======{table}
0行0列：数据：列1
0行1列：数据：列2
0行2列：数据：列3
0行3列：数据：列4
1行0列：数据：数据A1
...
4行2列：数据：数据C4
4行3列：数据：数据D4

三、写入word文档内容

利用python-docx库来写入数据到word文档，先创建文档，调用Document文档对象的方法实现写入，最后保存。

添加标题add_heading()
段落add_paragraph()
文字块add_run()
添加空白页add_page_break()
添加表格add_table()
添加图片add_picture()
保存doc.save()

1.写入标题、段落

from docx import Document
doc = Document()      # 新建一个空白文档
doc.add_heading('这是一个一级标题', level=1)  # 标题序号1~9
doc.add_heading('这是一个二级标题', level=2)  # 标题序号1~9
text = "段落：燕子去了，有再来的时候；杨柳枯了，"\
       "有再青的时候；桃花谢了，有再开的时候。"\
       "但是，聪明的，你告诉我，我们的日子为什么"\
       "一去不复返呢？——是有人偷了他们罢：那是谁？"\
       "又藏在何处呢？是他们自己逃走了罢：如今又到"\
       "了哪里呢？"
p = doc.add_paragraph(text)    # 插入段落文字
p.add_run('\n——选自')                # 添加文字块
p.add_run('《匆匆》').bold = True   # 添加文字块，设置粗体
p.add_run('朱自清').italic = True   # 添加文字块，设置斜体
doc.add_page_break()       # 插入空白页 

doc.save('./写入测试添加段落.docx')

输出效果如下：

python3读取word python 读取 word_python_04

2.写入表格

from docx import Document
doc = Document()      # 新建一个空白文档
doc.add_heading('一级标题：插入表格', level=1)  # 标题序号1~9
table = doc.add_table(rows = 1,cols =3)

# 数据
records = (
    (3, '101', '数据1'),
    (7, '422', '数据2'),
    (4, '631', '数据3')
)

# 设置表格头
hdr_cells = table.rows[0].cells
hdr_cells[0].text = '标题1'
hdr_cells[1].text = '标题2'
hdr_cells[2].text = '标题3'
for d1,d2,d3 in records:
    row_cells = table.add_row().cells
    row_cells[0].text = str(d1)
    row_cells[1].text = d2
    row_cells[2].text = d3
    
doc.save('./写入测试添加表格.docx')

输出效果如下：

python3读取word python 读取 word_开发语言_05

3.写入图片

from docx import Document
from docx.shared import Cm

doc = Document()      # 新建一个空白文档
doc.add_picture('./字节杂谈头像.png', width=Cm(2.25))  # 插入图片，宽度设2.25cm

doc.save('./写入测试插入图片.docx')

输出效果如下：

python3读取word python 读取 word_python3读取word_06

四、小结

python-docx库提供了python操作word文档的方式，对于重复性的操作可以实现word办公的自动化，使用下来比较轻量级。本文主要参考官方文档，做了一定的简化。对于深入的使用，还需要进一步研究官方文档。

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：排课冲突 java 排课问题代码

下一篇：微信程序架构图小程序架构

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯