ios - 在 iPhone 上将整个 pdf 页面解析为 NSString - 知识问答 - OStack程序员社区-中国程序员成长平台

菜鸟教程小白 发表于 2022-12-13 05:53:45

ios - 在 iPhone 上将整个 pdf 页面解析为 NSString

一段时间以来，我一直在尝试将 pdf 页面的文本解析为 NSString，而我唯一能找到的就是搜索特定字符串值的方法。

我想做的是在不使用任何外部库(如 PDFKitten、PDFKit 等)的情况下解析单页 PDF。

如果可能，我希望将数据保存在 NSArray、NSString 或 NSDictionary 中。

谢谢:D!

到目前为止我尝试过的一部分。

<pre><code>CGPDFDocumentRef MyGetPDFDocumentRef (const char *filename) {
CFStringRef path;
CFURLRef url;
CGPDFDocumentRef document;
path = CFStringCreateWithCString (NULL, filename,kCFStringEncodingUTF8);
url = CFURLCreateWithFileSystemPath (NULL, path, kCFURLPOSIXPathStyle, 0);
CFRelease (path);
document = CGPDFDocumentCreateWithURL (url);// 2
CFRelease(url);
int count = CGPDFDocumentGetNumberOfPages (document);// 3
if (count == 0) {
 printf("`%s' needs at least one page!", filename);
 return NULL;
}
return document;
}

// table methods to parse pdf
static void op_MP (CGPDFScannerRef s, void *info) {
const char *name;
if (!CGPDFScannerPopName(s, &name))
 return;
printf("MP /%s\n", name);
}

static void op_DP (CGPDFScannerRef s, void *info) {
const char *name;
if (!CGPDFScannerPopName(s, &name))
 return;
printf("DP /%s\n", name);
}

static void op_BMC (CGPDFScannerRef s, void *info) {
const char *name;
if (!CGPDFScannerPopName(s, &name))
 return;
printf("BMC /%s\n", name);
}

static void op_BDC (CGPDFScannerRef s, void *info) {
const char *name;
if (!CGPDFScannerPopName(s, &name))
 return;
printf("BDC /%s\n", name);
}

static void op_EMC (CGPDFScannerRef s, void *info) {
const char *name;
if (!CGPDFScannerPopName(s, &name))
 return;
printf("EMC /%s\n", name);
}

void MyDisplayPDFPage (CGContextRef myContext,size_t pageNumber,const char *filename) {
CGPDFDocumentRef document;
CGPDFPageRef page;
document = MyGetPDFDocumentRef (filename);// 1
totalPages=CGPDFDocumentGetNumberOfPages(document);
page = CGPDFDocumentGetPage (document, 1);// 2

CGPDFDictionaryRef d;

d = CGPDFPageGetDictionary(page);

CGPDFScannerRef myScanner;
CGPDFOperatorTableRef myTable;
myTable = CGPDFOperatorTableCreate();
CGPDFOperatorTableSetCallback (myTable, "MP", &op_MP);
CGPDFOperatorTableSetCallback (myTable, "DP", &op_DP);
CGPDFOperatorTableSetCallback (myTable, "BMC", &op_BMC);
CGPDFOperatorTableSetCallback (myTable, "BDC", &op_BDC);
CGPDFOperatorTableSetCallback (myTable, "EMC", &op_EMC);

CGPDFContentStreamRef myContentStream = CGPDFContentStreamCreateWithPage (page);// 3
myScanner = CGPDFScannerCreate (myContentStream, myTable, NULL);// 4

CGPDFScannerScan (myScanner);// 5

CGPDFStringRef str;

d = CGPDFPageGetDictionary(page);

if (CGPDFDictionaryGetString(d, "Lorem", &str)){
 CFStringRef s;
 s = CGPDFStringCopyTextString(str);
 if (s != NULL) {
 NSLog(@"%@ testing it", s);
 }
 CFRelease(s);
}
}

- (void)viewDidLoad {
;

MyDisplayPDFPage(UIGraphicsGetCurrentContext(), 1, [[ pathForResource:@"TestPage" ofType:@"pdf"] UTF8String]);

}
</code></pre>
 <hr><h1>Best Answer-推荐答案</h1> 
 Quartz 提供了让您检查 PDF 文档结构和内容流的功能。通过检查文档结构，您可以阅读文档目录中的条目以及与每个条目关联的内容。通过递归遍历目录，可以检查整个文档。

PDF 内容流正如其名称所暗示的那样 — 一个连续的数据流，例如 'BT 12/F71 Tf (draw this text) Tj 。 . . ' 其中 PDF 运算符及其描述符与实际 PDF 内容混合在一起。检查内容流需要您按顺序访问它。

 <a href="https://developer.apple.com/library/ios/documentation/graphicsimaging/conceptual/drawingwithquartz2d/dq_pdf_scan/dq_pdf_scan.html" rel="noreferrer noopener nofollow">This developer.apple documentation</a>展示了如何检查 PDF 文档的结构并解析 PDF 文档的内容。

关于ios - 在 iPhone 上将整个 pdf 页面解析为 NSString，我们在Stack Overflow上找到一个类似的问题：
<a href="https://stackoverflow.com/questions/20930282/" rel="noreferrer noopener nofollow" style="color: red;">
https://stackoverflow.com/questions/20930282/
</a>

页: [1]

OStack程序员社区-中国程序员成长平台's Archiver

ios - 在 iPhone 上将整个 pdf 页面解析为 NSString