at the very least, every file has to be read
On the contrary, that's the very most that could happen.
Running git commit
to commit your staged changes is generally fast because actually staging the changes did most of the work. Creating a commit simply turns the index (aka the "staging area") into a very lightweight commit object, which contains the metadata about your commit, and a few tree objects, which contain the structure of the repository.
All the data in the files, though, gets added to git's database when you run git add
on a particular file. The data about that file is then stored in the staging area so that when you run git commit
then all the information about that file is already in the index. So the costliest part is amortized over running git add
.
The other subtle thing is that the index contains the information about all the files in your repository - and it maintains information about the working directory like the time stamp that it last examined the file and its file size. So even if you run something like git add .
to stage all the changed files, it only needs to stat
the file to find out if it's changed, and it can ignore it if it hasn't.
Obviously looking at all the files in your working directory is a little bit expensive, but much less costly than adding a full snapshot of even the unchanged files.
So even though git stores a snapshot of the repository at each commit, it really only needs to store new data for the files that changed, it can store pointers to the old, unchanged file contents for everything else.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…