The best approach here is probably to use memory mapped files.
First you need a file handle, use the CreateFile
windows API function for that.
Then pass that to CreateFileMapping
to get a file mapping handle. Finally use MapViewOfFile
to map the file into memory.
To handle large files, MapViewOfFile
is able to map only a certain range into memory, so you can e.g. map the first 32MB, then use UnmapViewOfFile
to unmap it followed by a MapViewOfFile
for the next 32MB and so on. (EDIT: as was pointed out below, make sure that the blocks you map this way overlap by a multiple of 4kb, and at least as much as the length of the text you are searching for, so that you are not overlooking any text which might be split at the block boundary)
To do the actual searching once the (part of) the file is mapped into memory, you can make a copy of the source for StrPosLen
from SysUtils.pas (it's unfortunately defined in the implementation section only and not exposed in the interface). Leave one copy as is and make another copy, replacing Wide
with Ansi
every time. Also, if you want to be able to search in binary files which might contain embedded #0
's, you can remove the (Str1[I] <> #0) and
part.
Either find a way to identify if a file is ANSI or Unicode, or simply call both the Ansi and Unicode version on each mapped part of the file.
Once you are done with each file, make sure to call CloseHandle
first on the file mapping handle and then on the file handling. (And don't forget to call UnmapViewOfFile
first).
EDIT:
A big advantage of using memory mapped files instead of using e.g. a TFileStream to read the file into memory in blocks is that the bytes will only end up in memory once.
Normally, on file access, first Windows reads the bytes into the OS file cache. Then copies them from there into the application memory.
If you use memory mapped files, the OS can directly map the physical pages from the OS file cache into the address space of the application without making another copy (reducing the time needed for making the copy and halfing memory usage).
Bonus Answer: By calling StrLIComp instead of StrLComp you can do a case insensitive search.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…