Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
518 views
in Technique[技术] by (71.8m points)

java - Get excel sheet names for large excel files using Apache POI

I have the following code that I use to get the sheet names of an excel file(.xlsx)

    XSSFWorkbook workBookXlsx = new XSSFWorkbook(new FileInputStream(pathToFile));
    ArrayList<String> sheetNames = new ArrayList<>();

    int numberOfSheets = workBookXlsx.getNumberOfSheets();
    for (int i = 0; i < numberOfSheets; i++) {
        sheetNames.add(workBookXlsx.getSheetAt(i).getSheetName());
    }

    workBookXlsx = null;

The issue I have with the above code is that it takes a lot of memory(~700MB) & a long time(5-6s) to create the XSSFWorkbook for a file of size 9MB. Even setting the workBookXlsx to null doesn't release the memory taken by the javaw(I know gc may or maynot be called & JVM wont release memory just because I have set a variable to null)

I did go through the documentation of Workbook, XSSFWorkbook & from what I understood, there is no method that will help me get the sheet names with low memory imprint.

The one solution I have found is to manually unzip the .xlsx file and read the contents of the .xlwoorkbook.xml to get the sheet names and the r:id

Is there an API for getting the sheet names in an .xlsx file without large memory imprint?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

To show what @Gagravarr probably meant with his comment:

The XSSFReader contains a method XSSFReader.getSheetsData which "Returns an Iterator which will let you get at all the different Sheets in turn. Each sheet's InputStream is only opened when fetched from the Iterator. It's up to you to close the InputStreams when done with each one.". But as often this is not the whole truth. In truth it returns a XSSFReader.SheetIterator which has a method XSSFReader.SheetIterator.getSheetName to get the sheet names.

Example:

import java.io.InputStream;
import java.io.FileInputStream;

import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.xssf.eventusermodel.XSSFReader;

import java.util.Iterator;

public class ExcelXSSFGetSheetNamesXSSFReader {

 public static void main(String[] args) throws Exception {

  OPCPackage pkg = OPCPackage.open(new FileInputStream("Example.xlsx"));
  XSSFReader r = new XSSFReader( pkg );
  Iterator<InputStream> sheets = r.getSheetsData();

  if (sheets instanceof XSSFReader.SheetIterator) {
   XSSFReader.SheetIterator sheetiterator = (XSSFReader.SheetIterator)sheets;

   while (sheetiterator.hasNext()) {
    InputStream dummy = sheetiterator.next();

    System.out.println(sheetiterator.getSheetName());

    dummy.close();
   }
  }

  pkg.close();
 }
}

Conclusion: Currently you cannot work with apache poi only by trusting the API documentation. Instead you must always have a look at the source code.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...