Now the same 'huge' file resides in both the developer's forks on the server. It does not create a hard-link automatically
Actually, with Git 2.20, that issue might disappear, because of delta islands, a new way of doing delta computation so that an object that exists in one fork is not made into a delta against another object that does not appear in the same forked repository.
See commit fe0ac2f, commit 108f530, commit f64ba53 (16 Aug 2018) by Christian Couder (chriscool
).
Helped-by: Jeff King (peff
), and Duy Nguyen (pclouds
).
See commit 9eb0986, commit 16d75fa, commit 28b8a73, commit c8d521f (16 Aug 2018) by Jeff King (peff
).
Helped-by: Jeff King (peff
), and Duy Nguyen (pclouds
).
(Merged by Junio C Hamano -- gitster
-- in commit f3504ea, 17 Sep 2018)
Add delta-islands.{c,h}
Hosting providers that allow users to "fork" existing repositories want those forks to share as much disk space as possible.
Alternates are an existing solution to keep all the objects from all the forks into a unique central repository, but this can have some drawbacks.
Especially when packing the central repository, deltas will be created between objects from different forks.
This can make cloning or fetching a fork much slower and much more CPU intensive as Git might have to compute new deltas for many objects to avoid sending objects from a different fork.
Because the inefficiency primarily arises when an object is deltified against another object that does not exist in the same fork, we partition objects into sets that appear in the same fork, and define "delta islands".
When finding delta base, we do not allow an object outside the same island to be considered as its base.
So "delta islands" is a way to store objects from different forks in the same repository and packfile without having deltas between objects from different forks.
This patch implements the delta islands mechanism in "delta-islands.{c,h}
", but does not yet make use of it.
A few new fields are added in 'struct object_entry
' in "pack-objects.h
" though.
See Documentation/git-pack-objects.txt
: Delta Island:
DELTA ISLANDS
When possible, pack-objects
tries to reuse existing on-disk deltas to avoid having to search for new ones on the fly. This is an important optimization for serving fetches, because it means the server can avoid inflating most objects at all and just send the bytes directly from disk.
This optimization can't work when an object is stored as a delta against a base which the receiver does not have (and which we are not already sending). In that case the server "breaks" the delta and has to find a new one, which has a high CPU cost. Therefore it's important for performance that the set of objects in on-disk delta relationships match
what a client would fetch.
In a normal repository, this tends to work automatically.
The objects are mostly reachable from the branches and tags, and that's what clients fetch. Any deltas we find on the server are likely to be between objects the client has or will have.
But in some repository setups, you may have several related but separate groups of ref tips, with clients tending to fetch those groups independently.
For example, imagine that you are hosting several "forks" of a repository in a single shared object store, and letting clients view them as separate repositories through GIT_NAMESPACE or separate repositories using the alternates mechanism.
A naive repack may find that the optimal delta for an object is against a base that is only found in another fork.
But when a client fetches, they will not have the base object, and we'll have to find a new delta on the fly.
A similar situation may exist if you have many refs outside of refs/heads/
and refs/tags/
that point to related objects (e.g., refs/pull
or refs/changes
used by some hosting providers). By default, clients fetch only heads and tags, and deltas against objects found only in those other groups cannot be sent as-is.
Delta islands solve this problem by allowing you to group your refs into distinct "islands".
Pack-objects computes which objects are reachable from which islands, and refuses to make a delta from an object A
against a base which is not present in all of A
's islands. This results in slightly larger packs (because we miss some delta opportunities), but guarantees that a fetch of one island will not have to recompute deltas on the fly due to crossing island boundaries.
A side effect though: some commands were more verbose. Git 2.23 (Q3 2019) fixes this.
See commit bdbdf42 (20 Jun 2019) by Jeff King (peff
).
(Merged by Junio C Hamano -- gitster
-- in commit a4c8352, 09 Jul 2019)
delta-islands
: respect progress
flag
The delta island code always prints "Marked %d islands
", even if
progress has been suppressed with --no-progress
or by sending stderr to
a non-tty.
Let's pass a progress
boolean to load_delta_islands()
.
We already do the same thing for the progress meter in resolve_tree_islands()
.