Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
348 views
in Technique[技术] by (71.8m points)

java.util.scanner - Java Scanner(File) misbehaving, but Scanner(FIleInputStream) always works with the same file

I am having weird behavior with Scanner. It will work with a particular set of files I am using when I use the Scanner(FileInputStream) constructor, but it won't with the Scanner(File) constructor.

Case 1: Scanner(File)

Scanner s = new Scanner(new File("file"));
while(s.hasNextLine()) {
    System.out.println(s.nextLine());
}

Result: no output

Case 2: Scanner(FileInputStream)

Scanner s = new Scanner(new FileInputStream(new File("file")));
while(s.hasNextLine()) {
    System.out.println(s.nextLine());
}

Result: the file content outputs to the console.

The input file is a java file containing a single class.

I double checked programmatically (in Java) that:

  • the file exists,
  • is readable,
  • and has a non-zero filesize.

Typically Scanner(File) works for me in this case, I am not sure why it doesn't now.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

hasNextLine() calls findWithinHorizon() which in turns calls findPatternInBuffer(), searching a match for a line terminator character pattern defined as .*( |[ u2028u2029u0085])|.+$

Strange thing is that with both ways to construct a Scanner (with FileInputStream or via File), findPatternInBuffer returns a positive match if the file contains (independently from file size) for instance the 0x0A line terminator; but in the case the file contains a character out of ascii (ie >= 7f), using FileInputStream returns true while using File returns false.

Very simple test case:

create a file which contains just char "a"

# hexedit file     
00000000   61 0A                                                a.

# java Test.java
using File: true
using FileInputStream: true

now edit the file with hexedit to:

# hexedit file
00000000   61 0A 80                                             a..

# java Test.java
using File: false
using FileInputStream: true

in the test java code there is nothing else than what already in the question:

import java.io.*;
import java.lang.*;
import java.util.*;
public class Test {
    public static void main(String[] args) {
        try {
                File file1 = new File("file");
                Scanner s1 = new Scanner(file1);
                System.out.println("using File: "+s1.hasNextLine());
                File file2 = new File("file");
                Scanner s2 = new Scanner(new FileInputStream(file2));
                System.out.println("using FileInputStream: "+s2.hasNextLine());
        } catch (IOException e) {
                e.printStackTrace();
        }
    }
}

SO, it turns out this is a charset issue. In facts, changing the test to:

 Scanner s1 = new Scanner(file1, "latin1");

we get:

# java Test 
using File: true
using FileInputStream: true

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...