I understand that for a normal Spindle Drive system, reading files using multiple threads is inefficient.
This is a different case, I have a high-throughput file systems available to me, which provides read speeds up to 3GB/s, with 196 CPU cores and 2TB RAM
A single threaded Java program reads the file with maximum 85-100 MB/s, so I have potential to get better than single thread. I have to read files as big as 1TB in size and I have enough RAM to load it.
Currently I use the following or something similar, but need to write something with multi-threading to get better throughput:
Java 7 Files: 50 MB/s
List<String> lines = Files.readAllLines(Paths.get(path), encoding);
Java commons-io: 48 MB/s
List<String> lines = FileUtils.readLines(new File("/path/to/file.txt"), "utf-8");
The same with guava: 45 MB/s
List<String> lines = Files.readLines(new File("/path/to/file.txt"), Charset.forName("utf-8"));
Java Scanner Class: Very Slow
Scanner s = new Scanner(new File("filepath"));
ArrayList<String> list = new ArrayList<String>();
while (s.hasNext()){
list.add(s.next());
}
s.close();
I want to be able to load the file and build the same ArrayList, in the correct sorted sequence, as fast as possible.
There is another question that reads similar, but it is actually different, because of :
The question is discussing about systems where multi-threaded file I/O is physically impossible to be efficient, but due to technological advancements, we now have systems that are designed to support high-throughput I/O , and so the limiting factor is CPU/SW , which can be overcome by multi-threading the I/O.
The other question does not answer how to write code to multi-thread I/O.
See Question&Answers more detail:
os