Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
86 views
in Technique[技术] by (71.8m points)

python - Search specific word and move pdf

I am creating an search engine that is looking for a specific word in pdf, and if it finds it it will move folder to different folder.

This is what I have done so far!

import PyPDF2
import re
import os    
import shutil  

pattern = input("Enter string pattern to search: ")

src = 'Folder 1'
dest = 'Folder 2'

for file_name in os.listdir(src):
    object = PyPDF2.PdfFileReader(file_name, 'rb')
    numPages = object.getNumPages()

    for i in range(0, numPages):
        pageObj = object.getPage(i)
        text = pageObj.extractText()
   
        for match in re.finditer(pattern, text):
            print(f'Page no: {i} | Match: {match}')
            destination = shutil.copytree(src, dest, copy_function = shutil.copy)

When I run it I get the error:

FileNotFoundError: [Errno 2] No such file or directory:

I think the error is in following line because it does not see path, but only the name of the file in folder.

object = PyPDF2.PdfFileReader(file_name, 'rb')

How to assign path before file_name?

question from:https://stackoverflow.com/questions/65918804/search-specific-word-and-move-pdf

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
from pathlib import Path                                           # change
import PyPDF2
import re
import os    
import shutil  

pattern = input("Enter string pattern to search: ")

basepath = Path('') # absolute path to parent of folder 1/folder 2 # change

src = basepath / 'Folder 1'                                        # change
dest = basepath / 'Folder 2'                                       # change

for file in os.scandir(src):                                  # change
    object = PyPDF2.PdfFileReader(str(src / file.name), 'rb')      # change
    numPages = object.getNumPages()

    for i in range(0, numPages):
        pageObj = object.getPage(i)
        text = pageObj.extractText()
        mvFlag = True
        for match in re.finditer(pattern, text):
            print(f'Page no: {i} | Match: {match}')                # change below 
            if len(match)!=0 and mvFlag == True:
                destination = shutil.copytree(str(src / file.name), str(dest / file.name), copy_function = shutil.copy)
                mvFlag = False        # because we will move one pdf only once

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...