Mailing List Archive

sync-type: rsync vs git
A while back I switched one of my machines sync-type for the gentoo
repo from rsync to git using https://anongit.gentoo.org/git/repo/sync/gentoo.git
because that machine is behind a firewall that stopped allowing rsync
connections.

Is there any advantage (either to me or the Gentoo community) to
continue to use rsync and the rsync pool instead of switching the
rest of my machines to git?

I've been very impressed with the reliability and speed of sync
operations using git they never take more than a few seconds. When
using rsync, it seems like I regularly used to have to spend time
trying different mirrors and hard-wiring one in my config file because
the one I (or the pool) had chosen had fallen back to using a Bell-212
modem for its internet connection. Sync operations often used to take
many minutes and would sometimes just hang.

--
Grant
Re: sync-type: rsync vs git [ In reply to ]
On Wed, Apr 27, 2022 at 10:22 AM Grant Edwards
<grant.b.edwards@gmail.com> wrote:
>
> Is there any advantage (either to me or the Gentoo community) to
> continue to use rsync and the rsync pool instead of switching the
> rest of my machines to git?
>
> I've been very impressed with the reliability and speed of sync
> operations using git they never take more than a few seconds.

With git you might need to occasionally wipe your repository to delete
history if you don't want it to accumulate (I don't think there is a
way to do that automatically but if you can tell git to drop history
let me know).

Of course that history can come in handy if you need to revert something/etc.

If you sync infrequently - say once a month or less frequently, then
I'd expect rsync to be faster. This is because git has to fetch every
single set of changes since the last sync, while rsync just compares
everything at a file level. Over a long period of time that means
that if a package was revised 4 times and old versions were pruned 4
times, then you end up fetching and ignoring 2-3 versions of the
package that would just never be fetched at all with rsync. That can
add up if it has been a long time.

On the other hand, if you sync frequently (especially daily or more
often), then git is FAR less expensive in both IO and CPU on both your
side and on the server side. Your git client and the server just
communicate what revision they're at, the server can see all the
versions you're missing, and send the history in-between. Then your
client can see what objects it is missing that it wants and fetch
them. Since it is all de-duped by its design anything that hasn't
changed or which the repo has already seen will not need to be
transferred. With rsync you need to scan the entire filesystem
metadata at least on both ends to figure out what has changed, and if
your metadata isn't trustworthy you need to hash all the file contents
(which isn't done by default). Since git is content-hashed you
basically get more data integrity than the default level for rsync and
the only thing that needs to be read is the git metadata, which is
packed efficiently.

Bottom line is that I think git just makes more sense these days for
the typical gentoo user, who is far more likely to be interested in
things like changelogs and commit histories than users of other
distros. I'm not saying it is always the best choice for everybody,
but you should consider it and improve your git-fu if you need to.
Oh, and if you want the equivalent of an old changelog, just go into a
directory and run "git whatchanged ."

--
Rich
Re: sync-type: rsync vs git [ In reply to ]
On 2022-04-27, Rich Freeman <rich0@gentoo.org> wrote:
> On Wed, Apr 27, 2022 at 10:22 AM Grant Edwards
><grant.b.edwards@gmail.com> wrote:
>>
>> Is there any advantage (either to me or the Gentoo community) to
>> continue to use rsync and the rsync pool instead of switching the
>> rest of my machines to git?
>>
>> I've been very impressed with the reliability and speed of sync
>> operations using git they never take more than a few seconds.
>
> With git you might need to occasionally wipe your repository to
> delete history if you don't want it to accumulate (I don't think
> there is a way to do that automatically but if you can tell git to
> drop history let me know).

I don't think I have any history. I use sync-depth=1 and clone-depth=1.

Both git log and git whatchanged only show one commit.


> Of course that history can come in handy if you need to revert
> something/etc.

Perhaps I should keep a few levels of history...

> If you sync infrequently - say once a month or less frequently, then
> I'd expect rsync to be faster.

I generally sync several times a week, and git is often very much
faster than rsync. Git is always done in a few seconds. The time
required for rsync varies widely from a handfull of seconds to tens of
minutes.

> This is because git has to fetch every single set of changes since
> the last sync, while rsync just compares everything at a file level.
> [...]
> That can add up if it has been a long time.

AFAICT, the emerge repo git "depth" settings of 1 prevent that: the
intermediate versions are discarded on the server side as is previous
local history. The end result is similar to rsync: you fetch only the
current version of what's changed since the last "sync", and there's
no local history.

> Bottom line is that I think git just makes more sense these days for
> the typical gentoo user, who is far more likely to be interested in
> things like changelogs and commit histories than users of other
> distros. I'm not saying it is always the best choice for everybody,
> but you should consider it and improve your git-fu if you need to.
> Oh, and if you want the equivalent of an old changelog, just go into a
> directory and run "git whatchanged ."

Right now with a depth of 1, git log/whatchanged don't provide any
information (they think all files were new as of the last "sync").
What I should figure out is what settings will preserver a few levels
of changes that have been made to my local repo, without preserving
intermediate changes to the master repo that never got used locally.

IOW, I want all the changes made during a single "sync" to go into my
local repo as a single commit regardless of how many commits have been
made to the master repo since my previous "sync". I think git can do
that -- whether the emerge sync settings in /etc/portage/repos.conf/gentoo.conf
allow me to tell emerge to tell git to do that is the question.

--
Grant
Re: sync-type: rsync vs git [ In reply to ]
On 27/04/2022 16:18, Rich Freeman wrote:
> On Wed, Apr 27, 2022 at 10:22 AM Grant Edwards
> <grant.b.edwards@gmail.com> wrote:
>> Is there any advantage (either to me or the Gentoo community) to
>> continue to use rsync and the rsync pool instead of switching the
>> rest of my machines to git?
>>
>> I've been very impressed with the reliability and speed of sync
>> operations using git they never take more than a few seconds.
> With git you might need to occasionally wipe your repository to delete
> history if you don't want it to accumulate (I don't think there is a
> way to do that automatically but if you can tell git to drop history
> let me know).

Look into "git pack". It won't get rid of old versions, but I think it
compresses all the old stuff. But once the repository has been packed, I
gather it's normal for the old packed stuff to take up less space than
the current stuff.

Cheers,
Wol
Re: Re: sync-type: rsync vs git [ In reply to ]
On 27/04/2022 17:24, Grant Edwards wrote:
> IOW, I want all the changes made during a single "sync" to go into my
> local repo as a single commit regardless of how many commits have been
> made to the master repo since my previous "sync". I think git can do
> that -- whether the emerge sync settings in /etc/portage/repos.conf/gentoo.conf
> allow me to tell emerge to tell git to do that is the question.

I don't know as that will do you any good.

Just use git tags, every time you do a "sync; emerge", just tag the
repository with the date. So when you list the tags you'll see all the
dates you did an update, and by branching to that tag, you'll be able to
go back to that date.

I just use "lvm snapshot" :-)

Cheers,
Wol