Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
308 views
in Technique[技术] by (71.8m points)

pandas - Calculating source frequency in Python

I'm new to python; I was looking for calculating the source frequency. I have files(sources are in tokens) and I want to find words that are shown in all sources to calculate. For example, the word 'beautiful' in which sources are shown, the result the word 'beautiful' is in 5 sources. I already have the python code to find one word but I need to find all words from the file, how should I change the code any ideas??

from os import listdir

with open("C:/Users/elle/Desktop/Archivess/test/rez.txt", "w") as f:
    for filename in listdir("C:/Users/elle/Desktop/Archivess/test/sources/books/"):
        with open('C:/Users/elle/Desktop/Archivess/test/freqs/books/' + filename) as currentFile:
            text = currentFile.read()

            if ('beautiful' in text):
                f.write('The word excist in the file ' + filename[:-4] + '
')
            else:
                f.write('The word doen't excist in the file' + filename[:-4] + '
')

I will appreciate any help from you, thank you!

question from:https://stackoverflow.com/questions/65887411/calculating-source-frequency-in-python

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

As mentioned you need to escape the ' character. The way to escape it is putting ''. Like doen't

from os import listdir

with open("C:/Users/elle/Desktop/Archivess/test/rez.txt", "w") as f:
    for filename in listdir("C:/Users/elle/Desktop/Archivess/test/sources/books/"):
        with open('C:/Users/elle/Desktop/Archivess/test/freqs/books/' + filename) as currentFile:
            text = currentFile.read()
            text = text.strip().lower()
            text = text.replace(".", "").replace(",", "").replace(""", "").replace("'", "") # replace all .,"'
            words = text.split(" ") # split the text
            unique_words = set(words)
            count_dict = {}
            for each_word in words:
                if(each_word in count_dict):
                    count_dict[each_word] += 1
                else:
                    count_dict[each_word] = 1
            for k in count_dict:
                f.write('The word' + k +'excist in the file ' + filename[:-4] + ' for ' + str(count_dict[k]) + ' number of times' '
')

#             if ('beautiful' in text):
#                 f.write('The word excist in the file ' + filename[:-4] + '
')
#             else:
#                 f.write('The word doen't excist in the file' + filename[:-4] + '
')

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...