Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.1k views
in Technique[技术] by (71.8m points)

multithreading to read a file in Java

I am creating threads to read a file in java. When I create 2 threads, each thread reads the whole file while I want them to read different parts of file. I tried putting in sleep(), join(), yield() but after including them it is just slowing down the read.

public class MyClass implements Runnable {

    Thread thread;
    public MyClass(int numOfThreads) {
        for(int i=0;i < numOfThreads; i++) {
            thread = new Thread(this);
            thread.start();
        }
    }

    public void run() {
        readFile();
    }
}

In readFile, in the while loop(reading line by line) I invoked the sleep()/yield(). How can I make the threads read different parts of the file?

Updated with method used to read files...

public synchronized void readFile() {
    try {
        String str;
        BufferedReader buf = new BufferedReader(new FileReader("read.txt");
        while ((line = buf.readLine()) != null) {
            String[] info = str.split(" ");
            String first name = info[0];
            String second name = info[1];
            try {
                Thread.sleep(100);
            } catch (InterruptedException e) {
            }
        }  catch (IOException e) {
        System.out.println("Error : File not found");
        e.printStackTrace();
    }
}
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

I suppose you're thinking that reading a file with multiple threads like this will be faster than reading with one. This is almost certainly false. Threads get better performance on CPU-bound tasks using multiple cores or processors. But file reading is not a CPU-bound task.

The OS uses the disk controller to read bytes at the full bandwidth of the disk interface. For nearly any hardware combination, the speed is bounded by the disk (read and/or seek times), its controller, and its DMA interface or bus not by the CPU. It's easy for a CPU to keep the disk controller 100% busy, even several controllers for different disks. If you need proof of this, start a big file copy and watch CPU utilization. It won't be very high.

Therefore, of your multiple threads, only one will run at a time, adding overhead to a single-threaded computation.

What does slow file transfers is buffering. To gain flexibility, i/o libraries can end up buffering each character 2 or even 3 times.

The Java NIO library is meant to do away with as much of this overhead as possible. See for example this article. There are many similar ones. My experience is that a carefully written NIO reader will use most of the available performance of the hardware.

There is one caveat: If you have a heavy duty virus checker set to scan the kind of file you are reading, it might possibly make reading CPU-bound. In this unusual case, you might possibly get a boost by multi-threading depending on the checker architecture. In this case you'd find the total file size S and let thread k=0,1,..,n-1 read from offset kS/n to (k+1)S/n - 1 (by seeking to the right offset and tracking numbers of bytes read in each thread). However I still strongly suspect that the the additional head seek time and other effects of random access will cancel out any advantage to running the virus checker in multiple threads.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...