Most programming languages that work with Office products have some middle layer and this is usually where the bottleneck is, a good example is using PIA's/Interop or Open XML SDK.
One way to get the data at a lower level (bypassing the middle layer) is using a Driver.
150MB one-sheet excel file that takes about 7 minutes.
The best I could do is a 130MB file in 135 seconds, roughly 3 times faster:
Stopwatch sw = new Stopwatch();
sw.Start();
DataSet excelDataSet = new DataSet();
string filePath = @"c:empBigBook.xlsx";
// For .XLSXs we use =Microsoft.ACE.OLEDB.12.0;, for .XLS we'd use Microsoft.Jet.OLEDB.4.0; with "';Extended Properties="Excel 8.0;HDR=YES;"";
string connectionString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source='" + filePath + "';Extended Properties="Excel 12.0;HDR=YES;"";
using (OleDbConnection conn = new OleDbConnection(connectionString))
{
conn.Open();
OleDbDataAdapter objDA = new System.Data.OleDb.OleDbDataAdapter
("select * from [Sheet1$]", conn);
objDA.Fill(excelDataSet);
//dataGridView1.DataSource = excelDataSet.Tables[0];
}
sw.Stop();
Debug.Print("Load XLSX tool: " + sw.ElapsedMilliseconds + " millisecs. Records = " + excelDataSet.Tables[0].Rows.Count);
Win 7x64, Intel i5, 2.3ghz, 8GB ram, SSD250GB.
If I could recommend a hardware solution as well, try to resolve it with an SSD if you're using standard HDD's.
Note: I cant download your Excel spreadsheet example as I'm behind a corporate firewall.
PS. See MSDN - Fastest Way to import xlsx files with 200 MB of Data, the consensus being OleDB is the fastest.
PS 2. Here's how you can do it with python:
http://code.activestate.com/recipes/440661-read-tabular-data-from-excel-spreadsheets-the-fast/
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…