TL;DR: it's the fork point code
You are getting the effect of git rebase --fork-point
, which deliberately drops Dan's commit from your repository too. See also Git rebase - commit select in fork-point mode (although in my answer there I don't mention something I will here).
If you run the git rebase
yourself, you choose whether --fork-point
is used. The --fork-point
option is used when:
- you run
git rebase
with no <upstream>
argument (the --fork-point
is implied), or
- you run
git rebase --fork-point [<arguments>] <upstream>
.
This means that to rebase on your upstream without having --fork-point
applied, you should use:
git rebase @{u}
or:
git rebase --no-fork-point
Some details are Git-version-dependent, as --fork-point
became an option only in Git version 2.0 (but was secretly done by git pull
ever since 1.6.4.1, with methods growing more complex until the whole --fork-point
thing was invented).
Discussion
As you already know, git push --force
rudely overwrites the branch pointer, dropping some existing commit(s). You expected, though, that your git pull --rebase
would restore the dropped commit, since you already had it yourself. For naming convenience, let's use your naming, where Dan's commit gets dropped when Brian force-pushes. (As a mnemonic, let's say "Dan got Dropped".)
Sometimes it will! Sometimes, as long as your Git has Dan's commit in your repository, and you have Dan's commit in your history, Dan's commit will get restored when you rebase your commits. This includes the case where you are Dan. And yet, sometimes it won't, and this also include the case where you are Dan. In other words, it's not based on who you are at all.
The complete answer is a bit complicated, and it's worth noting that this behavior is something you can control.
About git pull
(don't use it)
First, let's make a brief note: git pull
is, in essence, just git fetch
followed by either git merge
or git rebase
.1 You choose in advance which command to run, by supplying --rebase
or setting a configuration entry, branch.branch-name.rebase
. However, you can run git fetch
yourself, and then run git merge
or git rebase
yourself, and if you do it this way, you gain access to additional options.2
The most important of these is the ability to inspect the result of the fetch before choosing your primary option (merge vs rebase). In other words, this gives you a chance to see that there was a commit dropped. If you had done a git fetch
earlier and gotten Dan's commit, then—with or without any intervening work where you may or may not have incorporated Dan's commit—done a second git fetch
, you would see something like this:
+ 5122532...6f1308f pu -> origin/pu (forced update)
Note the "(forced update)" annotation: this is what tells you that Dan got Dropped. (The branch name used here is pu
, which is one in the Git repo for Git that regularly gets force-updated; I just cut-and-pasted an actual git fetch
output here.)
1There are several niggling technical differences, especially in very old versions of Git (before 1.8.4). There is also, as I was recently reminded, one other special case, for a git pull
in a repository that has no commits on the current branch (typically, into a new empty repository): here git pull
invokes neither git merge
nor git rebase
, but rather runs git read-tree -m
and, if that succeeds, sets the branch name itself.
2I think you can supply all the necessary arguments on the command line, but that's not what I mean. In particular, the ability to run other Git commands between the fetch and the second step is what we want.
Basics of git rebase
The main and most fundamental thing to know about git rebase
is that it copies commits. The why is itself fundamental to Git: nothing—no one, and not Git itself—can change anything in a commit (or any other Git object), as the "true name" of a Git object is a cryptographic hash of its contents.3 Hence if you take a commit out of the database, modify anything—even a single bit—and go to put the object back in, you get a new, different hash: a new and different commit. It can be extremely similar to the original, but if any bit of it is different in any way, it's a new, different commit.
To see how these copies work, draw at least part of the commit graph. The graph is just a series of commits, starting from the newest—or tip—commit, whose true-name hash ID is stored in the branch's name. We say that the name points to the commit:
D <-- master
The commit, which I've called D
here, contains (as part of its hashed commit data) the hash ID of its parent commit, i.e., the commit that was the tip of the branch before we made D
. So it "points to" its parent, and its parent points further back:
... <- C <- D <-- master
The fact that the internal arrows are all backwards like this is usually not very important, so I tend to omit them here. When the one-letter names are not very important I just draw a round dot for each commit:
...--o--o <-- branch
For branch
to "branch off from" master
, we should draw both branches:
A--B--C--D <-- master
E--F--G <-- branch
Note that commit E
points back to commit B
.
Now, if we want to re-base branch
, so that it comes after commit D
(which is now the tip of master
), we need to copy commit E
to a new commit E'
that is "just as good as" C
, except that it has D
as its parent (and of course has a different snapshot as its source base as well):
E' <-- (temporary)
/
A--B--C--D <-- master
E--F--G <-- branch
We must now repeat this with F
and G
, and when we are all done, make the name branch
point to the last copy, G'
, abandoning the original chain in favor of the new one:
E'-F'-G' <-- branch
/
A--B--C--D <-- master
E--F--G [abandoned]
This is what git rebase
is all about: we pick out some set of commits to copy; we copy them to some new position, one at a time, in parent-first order (vs the more typical child-first backwards Git order); and then we re-point the branch label to the last-copied commit.
Note that this works even for the null case. If the name branch
points directly to B
and we rebase it on master
, we copy all zero commits that come after B
, copying them to come after D
. Then re-point the label branch
to the last-copied commit, which is none, which means we re-point branch
to commit D
. It's perfectly normal, in Git, to have several branch names all pointing to the same commit. Git knows which branch you are on by reading .git/HEAD
, which contains the name of the branch. The branch itself—some portion of the commit graph—is determined by the graph. This means the word "branch" is ambiguous: see What exactly do we mean by "branch"?
Note also that commit A
has no parents at all. It's the first commit in the repository: there was no previous commit. Commit A
is therefore a root commit, which is just a fancy way to say "a commit with no parents". We can also have commits with two or more parents; these are merge commits. (I did not draw any here, though. It's often unwise to rebase branch chains that contain merges, since it's literally impossible to rebase a merge and git rebase
has to re-perform the merge to approximate it. Normally git rebase
just omits merges entirely, which causes other problems.)
3Obviously, by the Pigeonhole Principle, any hash that reduces a longer bit-string to a fixed-length k-bit key must necessarily have collisions on some inputs. A key requirement for a Git hash function is that it avoid accidental collisions. The "cryptographic" part is not really crucial to Git, it just makes it hard (but of course not impossible) for someone to deliberately cause a collision. Collisions cause Git to be unable to add new objects, so they are bad, but—aside from bugs in the implementation—they don't actually break Git itself, just the further usage of Git for your own data.
Determining what to copy
One problem with rebasing lies in identifying which commits to copy.
Most of the time, it seems easy enough: you want Git to copy your commits, and not someone else's. But that's not always true—in large, distributed environments, with administrators and managers and so on, sometimes it's appropriate for someone to rebase someone else's commits. In any case, this is not how Git does it in the first place. Instead, Git uses the graph.
Naming a commit—e.g., writing branch
—tends to select not just that commit, but also that commit's parent commit, the parent's parent, and so on, all the way back to the root commit. (If there is a merge commit, we usually select all of its parent commits, and follow all of them back towards the root simultaneously. A graph can have more than one root, so this lets us select multiple strands going back to multiple roots, as well as branch-and-merge strands going back to a single root.) We call the set of all commits that we find, when starting from one commit and doing these parent traversals, the set of reachable commits.
For many purposes, including git rebase
, we need to make this en-masse selection stop, and we use Git's fancy set operations to do that. If we write master..branch
as a revision selector, this means: "All commits reachable from the tip of branch, except for any commits reachable from the tip of master." Look at this graph again:
A--B--C--D <-- master
E--F--G <-- branch
The commits reachable from branch
are G
, F
, E
, B
, and A</