在C#中阅读PDF文件

VB C#

using IronPdf;
using IronSoftware.Drawing;
using System.Collections.Generic;

// Extracting Image and Text content from Pdf Documents

// open a 128 bit encrypted PDF
var pdf = PdfDocument.FromFile("encrypted.pdf", "password");

// Get all text to put in a search index
string text = pdf.ExtractAllText();

// Get all Images
var allImages = pdf.ExtractAllImages();

// Or even find the precise text and images for each page in the document
for (var index = 0 ; index < pdf.PageCount ; index++)
{
    int pageNumber = index + 1;
    text = pdf.ExtractTextFromPage(index);
    List<AnyBitmap> images = pdf.ExtractBitmapsFromPage(index);
    //...
}

Imports IronPdf
Imports IronSoftware.Drawing
Imports System.Collections.Generic

' Extracting Image and Text content from Pdf Documents

' open a 128 bit encrypted PDF
Private pdf = PdfDocument.FromFile("encrypted.pdf", "password")

' Get all text to put in a search index
Private text As String = pdf.ExtractAllText()

' Get all Images
Private allImages = pdf.ExtractAllImages()

' Or even find the precise text and images for each page in the document
For index = 0 To pdf.PageCount - 1
	Dim pageNumber As Integer = index + 1
	text = pdf.ExtractTextFromPage(index)
	Dim images As List(Of AnyBitmap) = pdf.ExtractBitmapsFromPage(index)
	'...
Next index

Install-Package IronPdf

在C#中阅读PDF文件

IronPDF C# PDF 库中的 PdfDocument.ExtractAllText 方法非常适合处理基本的 PDF 文本读取任务。此方法可以轻松处理源 PDF 文档中的空白和编码差异。

PdfDocument.ExtractTextFromPage 从 PDF 的特定页面读取文本。在上面的例子中，我们看到它被迭代使用以从特定范围的页面中检索文本内容。

IronPDF还可以从PDF中提取原始图像。为此，请使用以下PdfDocument类中的任一方法：

ExtractAllImages：将 PDF 中嵌入的所有图像作为 IronSoftware.Drawing.AnyBitmap 对象返回。
ExtractAllRawImages：将所有嵌入的图像作为原始字节 (byte []) 的列表检索。
ExtractImagesFromPage：提取索引页面中包含的图像。
ExtractImagesFromPages：与ExtractImagesFromPage相同，但从特定的页面范围或多个指定页面中提取。
ExtractRawImagesFromPage和ExtractRawImagesFromPages：与前两种方法相同，但返回的提取图像为字节数组，而不是IronSoftware.Drawing.AnyBitmap对象。
如何在C#中读取PDF文件
1. 下载 IronPDF 库用于 C#
2. 从 PDF 中提取图像或文本
3. 阅读和查找特定文件中的单词
4. 查看原始文档的 PDF 输出结果