python - What's wrong with iShares SP 500 ETF's Excel file?

Question

Welcome To Ask or Share your Answers For Others

python - What's wrong with iShares SP 500 ETF's Excel file?

asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - What's wrong with iShares SP 500 ETF's Excel file?

I am trying to parse iShares SP 500 ETF's Excel file, which looks like this:

<?xml version="1.0"?>
<ss:Workbook xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">
<ss:Styles>
<ss:Style ss:ID="Default">
<ss:Alignment ss:Horizontal="Left"/>
</ss:Style>
...

It seems to be an old XLS Excel type file, but it is an XML file, yet xml.etree.ElementTree is complaining a lot.

I have tried:

import xml.etree.ElementTree as ET
tree = ET.parse(file_name)

and with encoding:

import xml.etree.ElementTree as ET
tree = ET.parse(file_name, parser=ET.XMLParser(recover=True))

import xml.etree.ElementTree as ET
tree = ET.parse(file_name, parser=ET.XMLParser(encoding='utf-8'))

error:

xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1, column 1

rb = xlrd.open_workbook(file_name, encoding_override='utf-8')
print(rb)

error:

xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'xefxbbxbfxefxbbxbf<?'

but none seems to work at all... could anyone guide me in the right direction?

question from:https://stackoverflow.com/questions/65895726/whats-wrong-with-ishares-sp-500-etfs-excel-file

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-06T19:16:46+0000

Your "XML" file is not well-formed:

It has two BOM characters before the XML declaration.

To remove (at least) one, see Convert UTF-8 with BOM to UTF-8 with no BOM in Python

Remove the & or change it to & on line 43.

<ss:Data ss:Type="String">iShares Core S&P 500 ETF</ss:Data>
                                        ^

Ditto for the & on line 117,188.

Without the above repairs, your file cannot be parsed as XML (because technically it isn't).

After the above repairs, you'll be able to parse it as XML, and Excel will be able to open it:

Categories

python - What's wrong with iShares SP 500 ETF's Excel file?

python - What's wrong with iShares SP 500 ETF's Excel file?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

See also

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags