There's no particular difference beyond the syntax. Technically, ExcelFile
is a class and read_excel
is a function. In either case, the actual parsing is handled by the _parse_excel
method defined within ExcelFile
.
In earlier versions of pandas, read_excel
consisted entirely of a single statement (other than comments):
return ExcelFile(path_or_buf,kind=kind).parse(sheetname=sheetname,
kind=kind, **kwds)
And ExcelFile.parse
didn't do much more than call ExcelFile._parse_excel
.
In recent versions of pandas, read_excel
ensures that it has an ExcelFile
object (and creates one if it doesn't), and then calls the _parse_excel
method directly:
if not isinstance(io, ExcelFile):
io = ExcelFile(io, engine=engine)
return io._parse_excel(...)
and with the updated (and unified) parameter handling, ExcelFile.parse
really is just the single statement:
return self._parse_excel(...)
That is why the docs for ExcelFile.parse
now say
Equivalent to read_excel(ExcelFile, ...) See the read_excel docstring for more info on accepted parameters
As for another answer which claims that ExcelFile.parse
is faster in a loop, that really just comes down to whether you are creating the ExcelFile
object from scratch every time. You could certainly create your ExcelFile
once, outside the loop, and pass that to read_excel
inside your loop:
xl = pd.ExcelFile(path)
for name in xl.sheet_names:
df = pd.read_excel(xl, name)
This would be equivalent to
xl = pd.ExcelFile(path)
for name in xl.sheet_names:
df = xl.parse(name)
If your loop involves different paths (in other words, you are reading many different workbooks, not just multiple sheets within a single workbook), then you can't get around having to create a brand-new ExcelFile
instance for each path anyway, and then once again, both ExcelFile.parse
and read_excel
will be equivalent (and equally slow).