前言
- 为了实现 Office 文档上传并实现在线预览功能,我试了 POI 和收费库 aspose,均不理想。
- 但是将 PDF 转换为图片有成熟的方案,于是,问题变成了:如何将 Office 文档转换成 PDF。
- LibreOffice 可将 Office 文当转换成 PDF文件,并且效果非常棒,和通过微软的 Office 直接另存为 PDF 文件的效果几乎一样。针对在线预览的需求,效果是最好的,也许是最好的解决方案。
- 不过,前提是需要在服务器上安装 LibreOffice,为了测试,开发环境也需要安装,不过,好的是 LibreOffice 跨平台。
尝试通过 LibreOffice 将 Office 文档直接转换成图片,可惜只得到第一页的内容,查看帮助,没有找到直接转换为图片的方法。
Java 利用 LibreOffice 将 Office 文档转换成 PDF
有两种转换方式,各有优劣,请自行选择。
异步转换
通过调用操作系统命令的方式实现,这个转换是异步的,根据文件的大小需要的时间不确定,如果在上传之后就要立即预览,需要用同步方式。
- 优点:实现方式简单,不需要额外配置信息,不需要添加第三方依赖库(当然 LibreOffice 是必须要安装的)。
- 缺点:发送指令之后,转换是否成功,是否有异常,无法获知,也就是说,转换是否成功,是不确定的。当然,通过严格的测试,一般还是可以保证转换的可靠性的。
- 缺点:代码运行期需要启动 LibreOffice 服务,需要占用操作系统资源,相对于异步转换方式,需要依赖第三方库,需要额外配置信息。
完整代码
添加依赖(仅同步方式需要)
12345 | <dependency> <groupId>org.jodconverter</groupId> <artifactId>jodconverter-local</artifactId> <version>4.2.4</version></dependency> |
在 resources 目录添加 libre.properties 文件(仅同步方式需要)
内容如下:
123456789 | # LibreOffice主目录libreOfficeHome=C:/dev/LibreOffice6.4# 开启多个LibreOffice进程,每个端口对应一个进程# portNumbers=2002,2003portNumbers=2002# 任务执行超时为5分钟taskExecutionTimeoutMinutes=5# 任务队列超时为1小时taskQueueTimeoutHours=1 |
转换类 LibreOfficeUtil
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364 | package com.example.demo;import com.example.factory.OfficeManagerInstance;import org.jodconverter.JodConverter;import java.io.File;public class LibreOfficeUtil { /** * 利用 JodConverter 将 Offfice 文档转换为 PDF(要依赖 LibreOffice),该转换为同步转换,返回时就已经转换完成 */ public static boolean convertOffice2PDFSyncIsSuccess(File sourceFile, File targetFile) { try { OfficeManagerInstance.start(); JodConverter.convert(sourceFile).to(targetFile).execute(); } catch (Exception e) { e.printStackTrace(); return false; } return true; } /** * 利用 LibreOffice 将 Office 文档转换成 PDF,该转换是异步的,返回时,转换可能还在进行中,转换是否有异常也未可知 * @param filePath 目标文件地址 * @param targetFilePath 输出文件夹 * @return 子线程执行完毕的返回值 */ public static int convertOffice2PDFAsync(String filePath, String fileName, String targetFilePath) throws Exception { String command; int exitStatus; String osName = System.getProperty("os.name"); String outDir = targetFilePath.length() > 0 ? " --outdir " + targetFilePath : ""; if (osName.contains("Windows")) { command = "cmd /c cd /d " + filePath + " && start soffice --headless --invisible --convert-to pdf ./" + fileName + outDir; } else { command = "libreoffice6.3 --headless --invisible --convert-to pdf:writer_pdf_Export " + filePath + fileName + outDir; } exitStatus = executeOSCommand(command); return exitStatus; } /** * 调用操作系统的控制台,执行 command 指令 * 执行该方法时,并没有等到指令执行完毕才返回,而是执行之后立即返回,返回结果为 0,只能说明正确的调用了操作系统的控制台指令,但执行结果如何,是否有异常,在这里是不能体现的,所以,更好的姿势是用同步转换功能。 */ private static int executeOSCommand(String command) throws Exception { Process process; process = Runtime.getRuntime().exec(command); // 转换需要时间,比如一个 3M 左右的文档大概需要 8 秒左右,但实际测试时,并不会等转换结束才执行下一行代码,而是把执行指令发送出去后就立即执行下一行代码了。 int exitStatus = process.waitFor(); if (exitStatus == 0) { exitStatus = process.exitValue(); } // 销毁子进程 process.destroy(); return exitStatus; }} |
OfficeManagerInstance(仅同步方式需要)
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162 | package com.example.factory;import org.jodconverter.office.LocalOfficeManager;import org.jodconverter.office.OfficeManager;import org.springframework.core.io.support.PropertiesLoaderUtils;import org.springframework.stereotype.Component;import javax.annotation.PostConstruct;import java.io.IOException;import java.util.Properties;/** * github https://github.com/uncleAndyChen * email 552087293@qq.com * homepage https://www.lovesofttech.com/ * author andyChen * since 2020/02/29 */@Componentpublic class OfficeManagerInstance { private static OfficeManager INSTANCE = null; public static synchronized void start() { officeManagerStart(); } @PostConstruct private void init() { try { Properties properties = PropertiesLoaderUtils.loadAllProperties("libre.properties"); String[] portNumbers = properties.getProperty("portNumbers", "").split(","); int[] ports = new int[portNumbers.length]; for (int i = 0; i < portNumbers.length; i++) { ports[i] = Integer.parseInt(portNumbers[i]); } LocalOfficeManager.Builder builder = LocalOfficeManager.builder().install(); builder.officeHome(properties.getProperty("libreOfficeHome", "")); builder.portNumbers(ports); builder.taskExecutionTimeout(Integer.parseInt(properties.getProperty("taskExecutionTimeoutMinutes", "")) * 1000 * 60); // minute builder.taskQueueTimeout(Integer.parseInt(properties.getProperty("taskQueueTimeoutHours", "")) * 1000 * 60 * 60); // hour INSTANCE = builder.build(); officeManagerStart(); } catch (IOException e) { e.printStackTrace(); } } private static void officeManagerStart() { if (INSTANCE.isRunning()) { return; } try { INSTANCE.start(); } catch (Exception e) { e.printStackTrace(); } }} |
附:libreoffice6.3 转换帮助文档
libreoffice6.3 转换文档的用法,官方没有详细的在线文档,通过 -h 可以查看到详细的帮助,已经可以满足开发所需。
例如将一个文件转换为 pdf :libreoffice6.3 --headless --invisible --convert-to pdf:writer_pdf_Export ./奇妙的记忆力.pptx
,后面可以指定保存 pdf 的目录,不指定就保存到当前目录。
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153 | [root@ebs-60027 lib64]# libreoffice6.3 -hUsage: soffice [argument...] argument - switches, switch parameters and document URIs (filenames). Using without special arguments: Opens the start center, if it is used without any arguments. {file} Tries to open the file (files) in the components suitable for them. {file} {macro:///Library.Module.MacroName} Opens the file and runs specified macros from the file. Getting help and information: --help | -h | -? Shows this help and quits. --helpwriter Opens built-in or online Help on Writer. --helpcalc Opens built-in or online Help on Calc. --helpdraw Opens built-in or online Help on Draw. --helpimpress Opens built-in or online Help on Impress. --helpbase Opens built-in or online Help on Base. --helpbasic Opens built-in or online Help on Basic scripting language. --helpmath Opens built-in or online Help on Math. --version Shows the version and quits. --nstemporarydirectory (MacOS X sandbox only) Returns path of the temporary directory for the current user and exits. Overrides all other arguments. General arguments: --quickstart[=no] Activates[Deactivates] the Quickstarter service. --nolockcheck Disables check for remote instances using one installation. --infilter={filter} Force an input filter type if possible. For example: --infilter="Calc Office Open XML" --infilter="Text (encoded):UTF8,LF,,," --pidfile={file} Store soffice.bin pid to {file}. --display {display} Sets the DISPLAY environment variable on UNIX-like platforms to the value {display} (only supported by a start script). User/programmatic interface control: --nologo Disables the splash screen at program start. --minimized Starts minimized. The splash screen is not displayed. --nodefault Starts without displaying anything except the splash screen (do not display initial window). --invisible Starts in invisible mode. Neither the start-up logo nor the initial program window will be visible. Application can be controlled, and documents and dialogs can be controlled and opened via the API. Using the parameter, the process can only be ended using the taskmanager (Windows) or the kill command (UNIX-like systems). It cannot be used in conjunction with --quickstart. --headless Starts in "headless mode" which allows using the application without GUI. This special mode can be used when the application is controlled by external clients via the API. --norestore Disables restart and file recovery after a system crash. --safe-mode Starts in a safe mode, i.e. starts temporarily with a fresh user profile and helps to restore a broken configuration. --accept={connect-string} Specifies a UNO connect-string to create a UNO acceptor through which other programs can connect to access the API. Note that API access allows execution of arbitrary commands. The syntax of the {connect-string} is: connection-type,params;protocol-name,params e.g. pipe,name={some name};urp or socket,host=localhost,port=54321;urp --unaccept={connect-string} Closes an acceptor that was created with --accept. Use --unaccept=all to close all acceptors. --language={lang} Uses specified language, if language is not selected yet for UI. The lang is a tag of the language in IETF language tag. Developer arguments: --terminate_after_init Exit after initialization complete (no documents loaded) --eventtesting Exit after loading documents. New document creation arguments: The arguments create an empty document of specified kind. Only one of them may be used in one command line. If filenames are specified after an argument, then it tries to open those files in the specified component. --writer Creates an empty Writer document. --calc Creates an empty Calc document. --draw Creates an empty Draw document. --impress Creates an empty Impress document. --base Creates a new database. --global Creates an empty Writer master (global) document. --math Creates an empty Math document (formula). --web Creates an empty HTML document. File open arguments: The arguments define how following filenames are treated. New treatment begins after the argument and ends at the next argument. The default treatment is to open documents for editing, and create new documents from document templates. -n Treats following files as templates for creation of new documents. -o Opens following files for editing, regardless whether they are templates or not. --pt {Printername} Prints following files to the printer {Printername}, after which those files are closed. The splash screen does not appear. If used multiple times, only last {Printername} is effective for all documents of all --pt runs. Also, --printer-name argument of --print-to-file switch interferes with {Printername}. -p Prints following files to the default printer, after which those files are closed. The splash screen does not appear. If the file name contains spaces, then it must be enclosed in quotation marks. --view Opens following files in viewer mode (read-only). --show Opens and starts the following presentation documents of each immediately. Files are closed after the showing. Files other than Impress documents are opened in default mode , regardless of previous mode. --convert-to OutputFileExtension[:OutputFilterName] \ [--outdir output_dir] [--convert-images-to] Batch convert files (implies --headless). If --outdir isn't specified, then current working directory is used as output_dir. If --convert-images-to is given, its parameter is taken as the target filter format for *all* images written to the output format. If --convert-to is used more than once, the last value of OutputFileExtension[:OutputFilterName] is effective. If --outdir is used more than once, only its last value is effective. For example: --convert-to pdf *.odt --convert-to epub *.doc --convert-to pdf:writer_pdf_Export --outdir /home/user *.doc --convert-to "html:XHTML Writer File:UTF8" \ --convert-images-to "jpg" *.doc --convert-to "txt:Text (encoded):UTF8" *.doc --print-to-file [--printer-name printer_name] [--outdir output_dir] Batch print files to file. If --outdir is not specified, then current working directory is used as output_dir. If --printer-name or --outdir used multiple times, only last value of each is effective. Also, {Printername} of --pt switch interferes with --printer-name. --cat Dump text content of the following files to console (implies --headless). Cannot be used with --convert-to. --script-cat Dump text content of any scripts embedded in the files to console (implies --headless). Cannot be used with --convert-to. -env:<VAR>[=<VALUE>] Set a bootstrap variable. For example: to set a non-default user profile path: -env:UserInstallation=file:///tmp/test Ignored switches: -psn Ignored (MacOS X only). -Embedding Ignored (COM+ related; Windows only). --nofirststartwizard Does nothing, accepted only for backward compatibility. --protector {arg1} {arg2} Used only in unit tests and should have two arguments. |