Quick Links

Re: managing git disk space usage

From:	Aidan Van Dyk <aidan(at)highrise(dot)ca>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: managing git disk space usage
Date:	2010-07-20 17:28:07
Message-ID:	20100720172807.GG6886@oak.highrise.ca
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

* Robert Haas <robertmhaas(at)gmail(dot)com> [100720 13:04]:

> 3. Clone the origin once. Apply patches to multiple branches by
> switching branches. Playing around with it, this is probably a
> tolerable way to work when you're only going back one or two branches
> but it's certainly a big nuisance when you're going back 5-7 branches.

This is what I do when I'm working on a project that has completely
proper dependancies, and you don't need to always re-run configure
between different branches. I use ccache heavily, so configure takes
longer than a complete build with a couple-dozen
actually-not-previously-seen changes...

But *all* dependancies need to be proper in the build system, or you end
up needing a git-clean-type-cleanup between branch switches, forcing a
new configure run too, which takes too much time...

Maybe this will cause make dependancies to be refined in PG ;-)

It has the advantage, that if "back patching" (or in reality, forward
patching) all happens in 1 repository, the git conflict machinery is all
using the same cache of resolutions, meaning that if you apply the same
patch to 2 different branches, with identical code/conflict, you don't
need to do the whole conflict resolution by hand from scratch in the 2nd
branch.

> 5. Use git clone --shared or git clone --references or
> git-new-workdir. While I once thought this was the solution, I can't
> take very seriously any solution that has a warning in the manual that
> says, essentially, git gc may corrupt your repository if you do this.

This is the type of setup I often use. I have a "central" set of git
repos that I have automatically straight mirror-clones of project
repositories. And they are kept up-to-date via cron. And any time I
clone a work repo, I use --reference.

Since I make sure I don't "remove" anything from the reference repo, I
don't have to worry about loosing objects other repositories might be
using from the "cache" repo. In case anyone is wondering, that's:
git clone --mirror $REPO /data/src/cache/$project.git
git --git-dir=/data/src/cache/$project.git config gc.auto 0

And then in crontab:
git --git-dir=/data/src/cache/$project.git fetch --quiet --all

With gc.auto disabled, and the only commands ever run being "git fetch",
no objects are removed, even if a remote rewinds and throws away
commits.

But this way means that the seperate repos only share the "past, from
central repository" history, which means that you have to jump through
hoops if you want to be able to use git's handyj
merging/cherry-picking/conflict tools when trying to rebase/port
patches between branches. You're pretty much limited to exporting a
patch, changing to a the new branch-repository, and applying the patch.

--
Aidan Van Dyk Create like a god,
aidan(at)highrise(dot)ca command like a king,
http://www.highrise.ca/ work like a slave.

In response to

managing git disk space usage at 2010-07-20 17:04:12 from Robert Haas

Responses

Re: managing git disk space usage at 2010-07-20 18:24:42 from Peter Eisentraut
Re: managing git disk space usage at 2010-07-21 19:00:48 from Dimitri Fontaine

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Alvaro Herrera	2010-07-20 17:50:20	Re: dynamically allocating chunks from shared memory
Previous Message	Robert Haas	2010-07-20 17:04:12	managing git disk space usage