python - Retrieve page numbers from document with pyPDF

Question

Welcome To Ask or Share your Answers For Others

python - Retrieve page numbers from document with pyPDF

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Retrieve page numbers from document with pyPDF

At the moment I'm looking into doing some PDF merging with pyPdf, but sometimes the inputs are not in the right order, so I'm looking into scraping each page for its page number to determine the order it should go in (e.g. if someone split up a book into 20 10-page PDFs and I want to put them back together).

I have two questions - 1.) I know that sometimes the page number is stored in the document data somewhere, as I've seen PDFs that render on Adobe as something like [1243] (10 of 150), but I've read documents of this sort into pyPDF and I can't find any information indicating the page number - where is this stored?

2.) If avenue #1 isn't available, I think I could iterate through the objects on a given page to try to find a page number - likely it would be its own object that has a single number in it. However, I can't seem to find any clear way to determine the contents of objects. If I run:

pdf.getPage(0).getContents()

This usually either returns:

{'/Filter': '/FlateDecode'}

or it returns a list of IndirectObject(num, num) objects. I don't really know what to do with either of these and there's no real documentation on it as far as I can tell. Is anyone familiar with this kind of thing that could point me in the right direction?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T18:43:56+0000

The following worked for me:

from PyPDF2 import PdfFileReader
pdf = PdfFileReader(open('path/to/file.pdf','rb'))
pdf.getNumPages()

Categories

python - Retrieve page numbers from document with pyPDF

python - Retrieve page numbers from document with pyPDF

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags