Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
98 views
in Technique[技术] by (71.8m points)

I can not find a way to deal with new pages in docx using Python

I have a docx file with 40 pages of text and I want to separate each page and import its context into a list. Is this possible? The only way I have found is to find the empty spots in my list but that does not always mean a page break. With my code I get the text after the word "Subject" is found and it stops after a blank spot is found. The thing is that need a way to recognise pagebreak in my code to solve some issues. This way page break is also being treated as a " " . Thanks in advance

import os
import docx


def read(name):
    doc = docx.Document(name)

    text =[]

    for par in doc.paragraphs:
        text.append(par.text)

    return text

''''''
for basename in os.listdir('files'):
    path = os.path.join('files', basename)
    jerk = read(path)
lari =[]
vaccum = []
indices = []
for i in jerk:
    if not i.find('Subject'):
        lari.append(jerk.index(i))
    indices.append(jerk.index(i))

for j in jerk:
    if jerk.index(j) in lari:
        for k in range(20):
            if jerk[jerk.index(j)+k]!='':
                vaccum.append(jerk[jerk.index(j) + k + 1])

            else:
                break
final =[]
var =''
for k in vaccum:
    var = var+k
    if k =='':
        final.append(var)
        var =''

print(vaccum)
question from:https://stackoverflow.com/questions/65602630/i-can-not-find-a-way-to-deal-with-new-pages-in-docx-using-python

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...