I've done some testing with Streams in special with DirectoryStreams of the nio-package. I simply try to get a list of all files in a directory sorted by last modified date and size.
The JavaDoc of old File.listFiles() stated a Note to the method in Files:
Note that the Files class defines the newDirectoryStream method to
open a directory and iterate over the names of the files in the
directory. This may use less resources when working with very large
directories.
I run the code down below a lot of times (first three times below):
First-run:
Run time of Arrays.sort: 1516
Run time of Stream.sorted as Array: 2912
Run time of Stream.sorted as List: 2875
Second-run:
Run time of Arrays.sort: 1557
Run time of Stream.sorted as Array: 2978
Run time of Stream.sorted as List: 2937
Third-run:
Run time of Arrays.sort: 1563
Run time of Stream.sorted as Array: 2919
Run time of Stream.sorted as List: 2896
My question is: Why do the streams perform so bad?
import java.io.File;
import java.io.IOException;
import java.io.UncheckedIOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.attribute.FileTime;
import java.util.Arrays;
import java.util.Comparator;
import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.Stream;
public class FileSorter {
// This sorts from old to young and from big to small
Comparator<Path> timeSizeComparator = (Path o1, Path o2) -> {
int sorter = 0;
try {
FileTime lm1 = Files.getLastModifiedTime(o1);
FileTime lm2 = Files.getLastModifiedTime(o2);
if (lm2.compareTo(lm1) == 0) {
Long s1 = Files.size(o1);
Long s2 = Files.size(o2);
sorter = s2.compareTo(s1);
} else {
sorter = lm1.compareTo(lm2);
}
} catch (IOException ex) {
throw new UncheckedIOException(ex);
}
return sorter;
};
public String[] getSortedFileListAsArray(Path dir) throws IOException {
Stream<Path> stream = Files.list(dir);
return stream.sorted(timeSizeComparator).
map(Path::getFileName).map(Path::toString).toArray(String[]::new);
}
public List<String> getSortedFileListAsList(Path dir) throws IOException {
Stream<Path> stream = Files.list(dir);
return stream.sorted(timeSizeComparator).
map(Path::getFileName).map(Path::toString).collect(Collectors.
toList());
}
public String[] sortByDateAndSize(File[] fileList) {
Arrays.sort(fileList, (File o1, File o2) -> {
int r = Long.compare(o1.lastModified(), o2.lastModified());
if (r != 0) {
return r;
}
return Long.compare(o1.length(), o2.length());
});
String[] fileNames = new String[fileList.length];
for (int i = 0; i < fileNames.length; i++) {
fileNames[i] = fileList[i].getName();
}
return fileNames;
}
public static void main(String[] args) throws IOException {
// File (io package)
File f = new File("C:\Windows\system32");
// Path (nio package)
Path dir = Paths.get("C:\Windows\system32");
FileSorter fs = new FileSorter();
long before = System.currentTimeMillis();
String[] names = fs.sortByDateAndSize(f.listFiles());
long after = System.currentTimeMillis();
System.out.println("Run time of Arrays.sort: " + ((after - before)));
long before2 = System.currentTimeMillis();
String[] names2 = fs.getSortedFileListAsArray(dir);
long after2 = System.currentTimeMillis();
System.out.
println("Run time of Stream.sorted as Array: " + ((after2 - before2)));
long before3 = System.currentTimeMillis();
List<String> names3 = fs.getSortedFileListAsList(dir);
long after3 = System.currentTimeMillis();
System.out.
println("Run time of Stream.sorted as List: " + ((after3 - before3)));
}
}
Update
After applying the code from Peter I have this results:
Run time of Arrays.sort: 1615
Run time of Stream.sorted as Array: 3116
Run time of Stream.sorted as List: 3059
Run time of Stream.sorted as List with caching: 378
Update 2
After doing some research on the solution of Peter, I can say, that reading file attributes with for ex. Files.getLastModified must be a heavy crunch. Changing only that part in Comparator to:
Comparator<Path> timeSizeComparator = (Path o1, Path o2) -> {
File f1 = o1.toFile();
File f2 = o2.toFile();
long lm1 = f1.lastModified();
long lm2 = f2.lastModified();
int cmp = Long.compare(lm2, lm1);
if (cmp == 0) {
cmp = Long.compare(f2.length(), f1.length());
}
return cmp;
};
Gets the even better result on my computer:
Run time of Arrays.sort: 1968
Run time of Stream.sorted as Array: 1999
Run time of Stream.sorted as List: 1975
Run time of Stream.sorted as List with caching: 488
But as you can see, caching the object is the much best way. And as jtahlborn mentioned, it is a kind of stable sort.
Update 3 (best solution I've found)
After a bit more research, I've seen, that the methods Files.lastModified and Files.size, both do a huge job on a same thing: Attributes. So I made three versions of the PathInfo class to test:
- Peters version as described down below
- An old style File version, where I do a Path.toFile() once in the constructor and get all values from that file with f.lastModified and f.length
- An version of Peters solution, but now I read an Attribute object with Files.readAttributes(path,BasicFileAttributes.class) and done things on this.
Putting it all in a loop for doing it 100 times each, I came up with these results:
After doing all hundred times
Mean performance of Peters solution: 432.26
Mean performance of old File solution: 343.11
Mean performance of read attribute object once solution: 255.66
Code in constructor of PathInfo for the best solution:
public PathInfo(Path path) {
try {
// read the whole attributes once
BasicFileAttributes bfa = Files.readAttributes(path, BasicFileAttributes.class);
fileName = path.getFileName().toString();
modified = bfa.lastModifiedTime().toMillis();
size = bfa.size();
} catch (IOException ex) {
throw new UncheckedIOException(ex);
}
}
My result: Never read attributes twice and caching in an Object is bursting performance.
See Question&Answers more detail:
os