方法一、Acrobat Scripting
var doc = app.openDoc('/c/test.pdf');
doc.saveAs("p.jpg", "com.adobe.acrobat.jpeg");
注意:此方法极其简单,而且快速,但生成的图片命名格式只能为:test_Page_01.jpg这种格式
方法二、利用.net来生成图片,代码如下:
[STAThread]
public static void Main(string[] args) {
if (args.Length != 1) {
Console.WriteLine("命令行格式:Pdf2Image <Pdf文件路径>");
return;
}
string pdfFilePath = args[0];
if(!System.IO.File.Exists(pdfFilePath))
{
Console.WriteLine("文件\"{0}\"不存在", pdfFilePath);
return;
}
FileInfo pdfFi = new FileInfo(pdfFilePath);
pdfFilePath = pdfFi.FullName;
string imageDirectoryPath = System.IO.Path.Combine(
pdfFi.DirectoryName,
pdfFi.Name.Replace(pdfFi.Extension,""));
try {
ConvertPdf2Image(pdfFilePath, imageDirectoryPath, 0, 0, null, 1);
}
catch (Exception ex) {
Console.WriteLine(ex.ToString());
}
Console.Read();
}
public static void ConvertPdf2Image(string pdfFilePath, string imageDirectoryPath,
int beginPageNum, int endPageNum, ImageFormat format, double zoom = 1) {
Acrobat.CAcroPDDoc pdfDoc = null;
Acrobat.CAcroPDPage pdfPage = null;
Acrobat.CAcroRect pdfRect = null;
Acrobat.CAcroPoint pdfPoint = null;
//生成操作Pdf文件的Com对象
pdfDoc = (Acrobat.CAcroPDDoc)Microsoft.VisualBasic.Interaction.CreateObject("AcroExch.PDDoc", "");
//检查输入参数
if (!pdfDoc.Open(pdfFilePath)) {
throw new FileNotFoundException(string.Format("源文件{0}不存在!", pdfFilePath));
}
if (!Directory.Exists(imageDirectoryPath)) {
Directory.CreateDirectory(imageDirectoryPath);
}
if (beginPageNum <= 0) {
beginPageNum = 1;
}
if (endPageNum > pdfDoc.GetNumPages() || endPageNum <= 0) {
endPageNum = pdfDoc.GetNumPages();
}
if (beginPageNum > endPageNum) {
throw new ArgumentException("参数\"beginPageNum\"必须小于\"endPageNum\"!");
}
if (format == null) {
format = ImageFormat.Png;
}
if (zoom <= 0) {
zoom = 1;
}
//转换
for (int i = beginPageNum; i <= endPageNum; i++) {
//取出当前页
pdfPage = (Acrobat.CAcroPDPage)pdfDoc.AcquirePage(i - 1);
//得到当前页的大小
pdfPoint = (Acrobat.CAcroPoint)pdfPage.GetSize();
//生成一个页的裁剪区矩形对象
pdfRect = (Acrobat.CAcroRect)Microsoft.VisualBasic.Interaction.CreateObject("AcroExch.Rect", "");
//计算当前页经缩放后的实际宽度和高度,zoom==1时,保持原比例大小
int imgWidth = (int)((double)pdfPoint.x * zoom);
int imgHeight = (int)((double)pdfPoint.y * zoom);
//设置裁剪矩形的大小为当前页的大小
pdfRect.Left = 0;
pdfRect.right = (short)imgWidth;
pdfRect.Top = 0;
pdfRect.bottom = (short)imgHeight;
//将当前页的裁剪区的内容编成图片后复制到剪贴板中
pdfPage.CopyToClipboard(pdfRect, 0, 0, (short)(100 * zoom));
IDataObject clipboardData = Clipboard.GetDataObject();
//检查剪贴板中的对象是否是图片,如果是图片则将其保存为指定格式的图片文件
if (clipboardData.GetDataPresent(DataFormats.Bitmap)) {
Bitmap pdfBitmap = (Bitmap)clipboardData.GetData(DataFormats.Bitmap);
pdfBitmap.Save(
Path.Combine(imageDirectoryPath, i.ToString("0000") + "." + format.ToString()), format);
pdfBitmap.Dispose();
}
}
//关闭和释放相关COM对象
pdfDoc.Close();
Marshal.ReleaseComObject(pdfRect);
Marshal.ReleaseComObject(pdfPoint);
Marshal.ReleaseComObject(pdfPage);
Marshal.ReleaseComObject(pdfDoc);
}
注意:此方法同样需要借助Acrobat,需要在c#工程中引入com+( Acrobat).并且引入Microsoft.Visual.Basic.dll(这个在本机的Microsoft.NET/Framework/V...下查找,而且此方法在更换服务器时需要调整Dcomconfig,用来修改Activex的权限。
方法三:利用Apache PDFBOX,Java代码片段如下:
PDDocument document = PDDocument.load(“C:/test.pdf“);
PDFImageWriter imageWriter = new PDFImageWriter();
imageWriter.writeImage(document,"jpg","",1,1,“p1“);
document.close();
注意:此方法用得最多,但垢病也是最多,在生几百页或上千页的大PDF时出现错误,需要修改pdfbox的源码。
方法四:利用PDFRenderer,Java代码片段如下:
首先去下载PDFRenderer 的jar包,然后引入到java工程中。
下载地址:https://java.net/projects/pdf-renderer/downloads
import com.sun.pdfview.PDFFile;
import com.sun.pdfview.PDFPage;
import java.awt.Graphics;
import java.awt.Image;
import java.awt.Rectangle;
import java.io.*;
import java.nio.ByteBuffer;
import java.nio.channels.*;
import javax.imageio.*;
import java.awt.image.*;
publicvoid convert() throws Exception {
//装载PDF
File file = new File("c:/test.pdf");
RandomAccessFile raf = new RandomAccessFile(file, "r");
FileChannel channel = raf.getChannel();
ByteBuffer buf = channel.map(FileChannel.MapMode.READ_ONLY,0, channel.size());
PDFFile pdffile = new PDFFile(buf);
//获取PDF页码
int jumlahhalaman = pdffile.getNumPages();
//遍历生成单页图片
for (int i = 1; i <= jumlahhalaman; i++) {
PDFPage page = pdffile.getPage(i);
//创建图片
Rectangle rect = new Rectangle(0, 0,(int) page.getWidth(),(int) page.getHeight());
Image img = page.getImage(
rect.width, rect.height, //width & height
rect, // clip rect
null, // null for the ImageObserver
true, // fill background with white
true// block until drawing is done
);
BufferedImage bufferedImage = new BufferedImage(rect.width, rect.height, BufferedImage.TYPE_INT_RGB);
Graphics g = bufferedImage.createGraphics();
g.drawImage(img, 0, 0, null);
g.dispose();
File asd = new File("c:/p" + i + ".jpg");
if (asd.exists()) {
asd.delete();
}
ImageIO.write(bufferedImage, "jpg", asd);
}
}
国外有不少PDF处理案例,读者也可去参考,例如:Jpedal Pdf Library,不但生成可以image,而且可以将PDF完美生成html,生成文字坐标等。达到网页快速浏览PDF的效果。当然pdfbox里也有读取pdf文字大小坐标,并且也可以处理了成html,但效果相比Jpedal要差很多。
笔者在批量处理时使用了方法四,在独立大PDF处理时,选择了方法一。全部代码完全通过调试!