摘要:[習題]ASP.NET 讀取 PDF檔案、轉成 TXT文字檔
[習題]ASP.NET 讀取 PDF檔案、轉成 TXT文字檔
找了一下 Google,發現滿多人推薦 PDFBox(請由此下載 http://sourceforge.net/projects/pdfbox/files/)
它原本是 Java PDF Library,但也提供了 .NET可參考的 DLL檔,對於中文的支援也加入了。
======================================================
本範例 資料來源: http://blog.csdn.net/yezi2413/archive/2008/10/23/3132074.aspx
特此感謝。
======================================================
首先,下載 PDFBox。
解壓縮之後,裡面有一個 \bin子目錄,就有 .NET可用的 DLL檔。
並且在 VS 2008裡面「加入參考」
只要加入這兩個,其他的都會自動添加。
IKVM.GNU.Classpath.dll
PDFBox-0.7.3.dll
別忘了,在程式裡面,自己加入這兩個 NameSpace喔!
using org.pdfbox.pdmodel;
using org.pdfbox.util;
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/ExpandedBlockStart.gif)
02
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/None.gif)
03
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/None.gif)
04
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/None.gif)
05
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/None.gif)
06
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/None.gif)
07
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/None.gif)
08
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/None.gif)
09
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/None.gif)
10
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/None.gif)
11
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/None.gif)
12
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/None.gif)
13
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/None.gif)
14
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/None.gif)
15
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/None.gif)
16
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/None.gif)
17
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/None.gif)
18
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/None.gif)
19
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/None.gif)
20
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/None.gif)
21
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/None.gif)
22
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/None.gif)
23
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/None.gif)
24
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/None.gif)
25
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/None.gif)
26
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/None.gif)
27
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/None.gif)
28
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/None.gif)
29
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/None.gif)
30
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/None.gif)
31
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/None.gif)
32
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/None.gif)
33
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/None.gif)
34
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/None.gif)
35
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/None.gif)
36
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/None.gif)
37
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/ExpandedBlockEnd.gif)
38
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/None.gif)
39
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/None.gif)
40
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/None.gif)
41
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/None.gif)
42
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/None.gif)
43
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/None.gif)
44
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/None.gif)
45
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/ExpandedBlockStart.gif)
46
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/InBlock.gif)
47
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/InBlock.gif)
48
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/ExpandedSubBlockStart.gif)
49
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/InBlock.gif)
50
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/InBlock.gif)
51
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/InBlock.gif)
52
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/InBlock.gif)
53
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/InBlock.gif)
54
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/InBlock.gif)
55
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/InBlock.gif)
56
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/InBlock.gif)
57
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/InBlock.gif)
58
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/InBlock.gif)
59
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/InBlock.gif)
60
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/InBlock.gif)
61
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/ExpandedSubBlockStart.gif)
62
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/InBlock.gif)
63
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/InBlock.gif)
64
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/ExpandedSubBlockEnd.gif)
65
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/InBlock.gif)
66
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/ExpandedSubBlockStart.gif)
67
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/InBlock.gif)
68
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/ExpandedSubBlockEnd.gif)
69
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/InBlock.gif)
70
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/ExpandedSubBlockEnd.gif)
71
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/InBlock.gif)
72
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/InBlock.gif)
73
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/InBlock.gif)
74
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/ExpandedSubBlockStart.gif)
75
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/InBlock.gif)
76
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/InBlock.gif)
77
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/InBlock.gif)
78
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/InBlock.gif)
79
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/InBlock.gif)
80
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/InBlock.gif)
81
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/InBlock.gif)
82
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/InBlock.gif)
83
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/InBlock.gif)
84
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/InBlock.gif)
85
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/ExpandedSubBlockEnd.gif)
86
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/InBlock.gif)
87
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/ExpandedBlockEnd.gif)
Line 63 ---- 原作的範例,在pdf2txt(file, pdffile); 這個地方稍有問題,我修改為 pdf2txt(pdffile, file);
Line 81 ---- 程式中的編碼65001就是 utf-8,您也可以寫成
StreamWriter swPdfChange = new StreamWriter(txtfile.FullName, false, Encoding.GetEncoding("utf-8"));
============================================================================
原本第63行 pdf2txt(pdffile, file);,若改成以下的程式碼,
讀取出來會變成亂碼。
有人建議「還是乖乖轉成 txt文字檔,可以剃除掉 PDF檔案裡面包含的圖片」
上面的程式 第63行 pdf2txt(pdffile, file);,就是將 PDF轉成 txt文字檔
就一切ok了。
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/None.gif)
02
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/None.gif)
03
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/None.gif)
04
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/ExpandedBlockStart.gif)
05
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/None.gif)
06
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/None.gif)
07
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/ExpandedBlockEnd.gif)
08
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/ExpandedBlockStart.gif)
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/None.gif)
09
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/InBlock.gif)
10
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/InBlock.gif)
11
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/InBlock.gif)
12
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/ExpandedSubBlockStart.gif)
13
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/InBlock.gif)
14
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/ExpandedSubBlockEnd.gif)
15
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/InBlock.gif)
16
![](http://www.dotblogs.com.tw/Providers/BlogEntryEditor/FCKeditor/editor/dialog/InsertCode/codeimages/ExpandedBlockEnd.gif)
======================================================
本範例 資料來源: http://blog.csdn.net/yezi2413/archive/2008/10/23/3132074.aspx
特此感謝。
======================================================
另外一個範例,請參閱
[微軟範例] iTextSharp.dll 將 GridView匯出 doc/access/csv/Excel/pdf/xml/html/text/print
我將思想傳授他人, 他人之所得,亦無損於我之所有;
猶如一人以我的燭火點燭,光亮與他同在,我卻不因此身處黑暗。----Thomas Jefferson
線上課程教學,遠距教學 (Web Form 約 51hr) https://dotblogs.com.tw/mis2000lab/2016/02/01/aspnet_online_learning_distance_education_VS2015
線上課程教學,遠距教學 (ASP.NET MVC 約 135hr) https://dotblogs.com.tw/mis2000lab/2018/08/14/ASPnet_MVC_Online_Learning_MIS2000Lab
寫信給我,不要私訊 -- mis2000lab (at) yahoo.com.tw 或 school (at) mis2000lab.net
(1) 第一天 ASP.NET MVC5 完整影片(5.5小時 / .NET 4.x版)免費試聽。影片 https://youtu.be/9spaHik87-A
(2) 第一天 ASP.NET Core MVC 完整影片(3小時 / .NET Core 6.0~8.0)免費試聽。影片 https://youtu.be/TSmwpT-Bx4I
[學員感言] mis2000lab課程評價 - ASP.NET MVC , WebForm 。 https://mis2000lab.medium.com/%E5%AD%B8%E5%93%A1%E6%84%9F%E8%A8%80-mis2000lab%E8%AA%B2%E7%A8%8B%E8%A9%95%E5%83%B9-asp-net-mvc-webform-77903ce9680b
ASP.NET遠距教學、線上課程(Web Form + MVC)。 第一天課程, "完整" 試聽。
......... facebook社團 https://www.facebook.com/mis2000lab ......................
......... YouTube (ASP.NET) 線上教學影片 https://www.youtube.com/channel/UC6IPPf6tvsNG8zX3u1LddvA/
Blog文章 "附的範例" 無法下載,請看 https://dotblogs.com.tw/mis2000lab/2016/03/14/2008_2015_mis2000lab_sample_download
請看我們的「售後服務」範圍(嚴格認定)。
......................................................................................................................................................
ASP.NET MVC => .NET Core MVC 線上教學 ...... 第一天課程 完整內容 "免費"讓您評估 / 試聽
![](https://dotblogsfile.blob.core.windows.net/user/mis2000lab/f1483b52-7b86-4c3e-9a7c-fc2bf0aa9efd/1536574174_96277.jpg)
[遠距教學、教學影片] ASP.NET (Web Form) 課程 上線了!MIS2000Lab.主講 事先錄好的影片,並非上課側錄! 觀看時,有如「一對一」面對面講課。
![](https://dotblogsfile.blob.core.windows.net/user/mis2000lab/ff8ecc53-3648-4066-9826-4cc3fd053a79/1454295761_0869.jpg)