python获取docx所有图片保存

原创

mob64ca12db7156 2023-12-17 05:43:51 ©著作权

文章标签 Image python 保存图片 文章分类 Python 后端开发

©著作权归作者所有：来自51CTO博客作者mob64ca12db7156的原创作品，请联系作者获取转载授权，否则将追究法律责任

Python获取docx所有图片保存

1. 简介

在日常开发中，我们经常会遇到需要从docx文档中提取图片并保存的需求。本文将教会你如何使用Python获取docx文档中的所有图片，并保存到本地。

2. 整体流程

整个过程可以分为以下几个步骤：

步骤	描述
步骤一	打开docx文档
步骤二	获取文档中的所有图片
步骤三	保存图片到本地

下面我们将详细介绍每一个步骤需要做什么，并提供相应的代码示例。

3. 步骤一：打开docx文档

首先，我们需要使用第三方库python-docx来处理docx文档。可以通过在命令行中运行以下命令来安装该库：

pip install python-docx

安装完毕后，我们可以使用如下代码来打开docx文档：

from docx import Document

def open_docx(file_path):
    doc = Document(file_path)
    return doc

在这段代码中，我们导入了Document类，并定义了一个open_docx函数用于打开docx文档。传入docx文档的路径作为参数，函数将返回一个Document对象。

4. 步骤二：获取文档中的所有图片

在打开了docx文档之后，我们需要获取文档中的所有图片。Python-docx库提供了InlineShapes和Shapes两个属性来获取文档中的图片。

def get_images_from_docx(doc):
    images = []

    # 获取InlineShapes中的图片
    inline_shapes = doc.inline_shapes
    for inline_shape in inline_shapes:
        if inline_shape.has_picture:
            images.append(inline_shape)

    # 获取Shapes中的图片
    shapes = doc.shapes
    for shape in shapes:
        if shape.has_picture:
            images.append(shape)

    return images

在这段代码中，我们创建了一个空列表images用于存储获取到的图片。我们首先遍历InlineShapes中的图片，如果图片存在则将其添加到images列表中。然后遍历Shapes中的图片，同样将其添加到images列表中。

5. 步骤三：保存图片到本地

获取到所有的图片后，我们需要将这些图片保存到本地。每个图片都有一个image属性，它包含了图片的二进制数据。我们可以使用PIL库来读取和保存图片。

首先，我们需要安装PIL库：

pip install pillow

安装完毕后，我们可以使用如下代码来保存图片：

from PIL import Image

def save_images(images, output_dir):
    for img in images:
        image = img.image
        image_data = image.blob

        # 使用PIL库读取图片
        pil_image = Image.open(image_data)

        # 保存图片
        image_name = img.title + '.png'
        image_path = os.path.join(output_dir, image_name)
        pil_image.save(image_path, 'PNG')

在这段代码中，我们遍历了所有的图片，首先从图片对象中获取到图片的二进制数据image_data。然后使用PIL库的Image.open函数读取图片，接着使用Image.save函数将图片保存到本地。

6. 完整代码示例

from docx import Document
from PIL import Image
import os

def open_docx(file_path):
    doc = Document(file_path)
    return doc

def get_images_from_docx(doc):
    images = []

    # 获取InlineShapes中的图片
    inline_shapes = doc.inline_shapes
    for inline_shape in inline_shapes:
        if inline_shape.has_picture:
            images.append(inline_shape)

    # 获取Shapes中的图片
    shapes = doc.shapes
    for shape in shapes:
        if shape.has_picture:
            images.append(shape)

    return images

def save_images(images, output_dir):
    for img in images:
        image = img.image
        image_data = image.blob

        # 使用PIL库读取图片
        pil_image = Image.open(image_data)

        # 保存图片
        image_name = img.title + '.png'
        image_path = os.path.join(output_dir, image_name)
        pil_image.save(image_path, '