I'd like to search pattern in very large file (f.e above 1 GB) that consists of single line.
It is not possible to load it into memory. Currently, I use BufferedReader
to read into buffers (1024 chars).
The main steps:
- Read data into two buffers
- Search pattern in that buffers
- Increment variable if pattern was found
- Copy second buffer into first
- Load data into second buffers
- Search pattern in both buffers.
- Increment variable if pattern was found
- Repeat above steps (start from 4) until EOF
That algorithm (two buffers) lets me to avoid situation, where searched piece of text is split by chunks. It works like a chram unless pattern result is smaller that two buffers length. For example I can't manage with case, when result is longer - let's say long as 3 buffers (but I've only data in two buffers, so match will fail!). What's more, I can realize such a case:
- Prepare 1 GB single line file, that consits of "baaaaaaa(....)aaaaab"
- Search for pattern
ba*b
.
- The whole file match pattern!
- I don't have to print the result, I've only to be able to say: "Yea, I was able to find pattern" or "No, I wasn't able to find that".
It's possible with java? I mean:
- Ability to determine, whether a pattern is present in file (without loading whole line into memory, see case above
- Find the way handle the case, when match result is longer than chunk.
I hope my explanation is pretty clear.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…