Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.1k views
in Technique[技术] by (71.8m points)

file io - Python: Traceback codecs.charmap_decode(input,self.errors,decoding_table)[0]

Following is sample code, aim is just to merges text files from give folder and it's sub folder. i am getting Traceback occasionally so not sure where to look. also need some help to enhance the code to prevent blank line being merge & to display no lines in merged/master file. Probably it's good idea to before merging file, some cleanup should performed or just to ignores blank line during merging process.

Text file in folder is not more then 1000 lines but aggregate master file could cross 10000+ lines very easily.

import os
root = 'C:\Dropbox\ans7i\'
files = [(path,f) for path,_,file_list in os.walk(root) for f in file_list]
out_file = open('C:\Dropbox\Python\master.txt','w')
for path,f_name in files:
    in_file = open('%s/%s'%(path,f_name), 'r')

    # write out root/path/to/file (space) file_contents
    for line in in_file:
        out_file.write('%s/%s %s'%(path,f_name,line))
    in_file.close()

    # enter new line after each file
    out_file.write('
')

with open('master.txt', 'r') as f:
  lines = f.readlines()
with open('master.txt', 'w') as f:
  f.write("".join(L for L in lines if L.strip())) 



Traceback (most recent call last):
  File "C:DropboxPythonmaster.py", line 9, in <module> for line in in_file:
  File "C:PYTHON32LIBencodingscp1252.py", line  23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0]  
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 972: character maps to <undefined>  
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

The error is thrown because Python 3 opens your files with a default encoding that doesn't match the contents.

If all you are doing is copying file contents, you'd be better off using the shutil.copyfileobj() function together with opening the files in binary mode. That way you avoid encoding issues altogether (as long as all your source files are the same encoding of course, so you don't end up with a target file with mixed encodings):

import shutil
import os.path

with open('C:\Dropbox\Python\master.txt','wb') as output:
    for path, f_name in files:
        with open(os.path.join(path, f_name), 'rb') as input:
            shutil.copyfileobj(input, output)
        output.write(b'
') # insert extra newline between files

I've cleaned up the code a little to use context managers (so your files get closed automatically when done) and to use os.path to create the full path for your files.

If you do need to process your input line by line you'll need to tell Python what encoding to expect, so it can decode the file contents to python string objects:

open(path, mode, encoding='UTF8')

Note that this requires you to know up front what encoding the files use.

Read up on the Python Unicode HOWTO if you have further questions about python 3, files and encodings.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...