Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
309 views
in Technique[技术] by (71.8m points)

c - What goes on behind the curtains during disk I/O?

When I seek to some position in a file and write a small amount of data (20 bytes), what goes on behind the scenes?

My understanding

To my knowledge, the smallest unit of data that can be written or read from a disk is one sector (traditionally 512 bytes, but that standard is now changing). That means to write 20 bytes I need to read a whole sector, modify some of it in memory and write it back to disk.

This is what I expect to be happening in unbuffered I/O. I also expect buffered I/O to do roughly the same thing, but be clever about its cache. So I would have thought that if I blow locality out the window by doing random seeks and writes, both buffered and unbuffered I/O ought to have similar performance... maybe with unbuffered coming out slightly better.

Then again, I know it's crazy for buffered I/O to only buffer one sector, so I might also expect it to perform terribly.

My application

I am storing values gathered by a SCADA device driver that receives remote telemetry for upwards of a hundred thousand points. There is extra data in the file such that each record is 40 bytes, but only 20 bytes of that needs to be written during an update.

Pre-implementation benchmark

To check that I don't need to dream up some brilliantly over-engineered solution, I have run a test using a few million random records written to a file that could contain a total of 200,000 records. Each test seeds the random number generator with the same value to be fair. First I erase the file and pad it to the total length (about 7.6 meg), then loop a few million times, passing a random file offset and some data to one of two test functions:

void WriteOldSchool( void *context, long offset, Data *data )
{
    int fd = (int)context;
    lseek( fd, offset, SEEK_SET );
    write( fd, (void*)data, sizeof(Data) );
}

void WriteStandard( void *context, long offset, Data *data )
{
    FILE *fp = (FILE*)context;
    fseek( fp, offset, SEEK_SET );
    fwrite( (void*)data, sizeof(Data), 1, fp );
    fflush(fp);
}

Maybe no surprises?

The OldSchool method came out on top - by a lot. It was over 6 times faster (1.48 million versus 232000 records per second). To make sure I hadn't run into hardware caching, I expanded my database size to 20 million records (file size of 763 meg) and got the same results.

Before you point out the obvious call to fflush, let me say that removing it had no effect. I imagine this is because the cache must be committed when I seek sufficiently far away, which is what I'm doing most of the time.

So, what's going on?

It seems to me that the buffered I/O must be reading (and possibly writing all of) a large chunk of the file whenever I try to write. Because I am hardly ever taking advantage of its cache, this is extremely wasteful.

In addition (and I don't know the details of hardware caching on disk), if the buffered I/O is trying to write a bunch of sectors when I change only one, that would reduce the effectiveness of the hardware cache.

Are there any disk experts out there who can comment and explain this better than my experimental findings? =)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Indeed, at least on my system with GNU libc, it looks like stdio is reading 4kB blocks before writing back the changed portion. Seems bogus to me, but I imagine somebody thought it was a good idea at the time.

I checked by writing a trivial C program to open a file, write a small of data once, and exit; then ran it under strace, to see which syscalls it actually triggered. Writing at an offset of 10000, I saw these syscalls:

lseek(3, 8192, SEEK_SET)                = 8192
read(3, ""..., 1808) = 1808
write(3, "hello", 5)                    = 5

Seems that you'll want to stick with the low-level Unix-style I/O for this project, eh?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...