I am trying to parse iShares SP 500 ETF's Excel file, which looks like this:
<?xml version="1.0"?>
<ss:Workbook xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">
<ss:Styles>
<ss:Style ss:ID="Default">
<ss:Alignment ss:Horizontal="Left"/>
</ss:Style>
...
It seems to be an old XLS Excel type file, but it is an XML file, yet xml.etree.ElementTree
is complaining a lot.
I have tried:
import xml.etree.ElementTree as ET
tree = ET.parse(file_name)
and with encoding:
import xml.etree.ElementTree as ET
tree = ET.parse(file_name, parser=ET.XMLParser(recover=True))
import xml.etree.ElementTree as ET
tree = ET.parse(file_name, parser=ET.XMLParser(encoding='utf-8'))
error:
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1, column 1
rb = xlrd.open_workbook(file_name, encoding_override='utf-8')
print(rb)
error:
xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'xefxbbxbfxefxbbxbf<?'
but none seems to work at all... could anyone guide me in the right direction?
question from:
https://stackoverflow.com/questions/65895726/whats-wrong-with-ishares-sp-500-etfs-excel-file 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…