When I seek to some position in a file and write a small amount of data (20 bytes), what goes on behind the scenes?
My understanding
To my knowledge, the smallest unit of data that can be written or read from a disk is one sector (traditionally 512 bytes, but that standard is now changing). That means to write 20 bytes I need to read a whole sector, modify some of it in memory and write it back to disk.
This is what I expect to be happening in unbuffered I/O. I also expect buffered I/O to do roughly the same thing, but be clever about its cache. So I would have thought that if I blow locality out the window by doing random seeks and writes, both buffered and unbuffered I/O ought to have similar performance... maybe with unbuffered coming out slightly better.
Then again, I know it's crazy for buffered I/O to only buffer one sector, so I might also expect it to perform terribly.
My application
I am storing values gathered by a SCADA device driver that receives remote telemetry for upwards of a hundred thousand points. There is extra data in the file such that each record is 40 bytes, but only 20 bytes of that needs to be written during an update.
Pre-implementation benchmark
To check that I don't need to dream up some brilliantly over-engineered solution, I have run a test using a few million random records written to a file that could contain a total of 200,000 records. Each test seeds the random number generator with the same value to be fair. First I erase the file and pad it to the total length (about 7.6 meg), then loop a few million times, passing a random file offset and some data to one of two test functions:
void WriteOldSchool( void *context, long offset, Data *data )
{
int fd = (int)context;
lseek( fd, offset, SEEK_SET );
write( fd, (void*)data, sizeof(Data) );
}
void WriteStandard( void *context, long offset, Data *data )
{
FILE *fp = (FILE*)context;
fseek( fp, offset, SEEK_SET );
fwrite( (void*)data, sizeof(Data), 1, fp );
fflush(fp);
}
Maybe no surprises?
The OldSchool
method came out on top - by a lot. It was over 6 times faster (1.48 million versus 232000 records per second). To make sure I hadn't run into hardware caching, I expanded my database size to 20 million records (file size of 763 meg) and got the same results.
Before you point out the obvious call to fflush
, let me say that removing it had no effect. I imagine this is because the cache must be committed when I seek sufficiently far away, which is what I'm doing most of the time.
So, what's going on?
It seems to me that the buffered I/O must be reading (and possibly writing all of) a large chunk of the file whenever I try to write. Because I am hardly ever taking advantage of its cache, this is extremely wasteful.
In addition (and I don't know the details of hardware caching on disk), if the buffered I/O is trying to write a bunch of sectors when I change only one, that would reduce the effectiveness of the hardware cache.
Are there any disk experts out there who can comment and explain this better than my experimental findings? =)
See Question&Answers more detail:
os