Re: Reducing buildfarm disk usage: remove temp installs when done

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Reducing buildfarm disk usage: remove temp installs when done
Date: 2015-01-19 05:28:54
Message-ID: 28310.1421645334@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: buildfarm-members pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> On 01/18/2015 09:20 PM, Tom Lane wrote:
>> What I see on dromedary, which has been around a bit less than a year,
>> is that the at-rest space consumption for all 6 active branches is
>> 2.4G even though a single copy of the git repo is just over 400MB:
>> $ du -hsc pgmirror.git HEAD REL*
>> 416M pgmirror.git
>> 363M HEAD
>> 345M REL9_0_STABLE
>> 351M REL9_1_STABLE
>> 354M REL9_2_STABLE
>> 358M REL9_3_STABLE
>> 274M REL9_4_STABLE
>> 2.4G total

> This isn't happening for me. Here's crake:
> [andrew(at)emma root]$ du -shc pgmirror.git/ [RH]*/pgsql
> 218M pgmirror.git/
> 149M HEAD/pgsql
> 134M REL9_0_STABLE/pgsql
> 138M REL9_1_STABLE/pgsql
> 140M REL9_2_STABLE/pgsql
> 143M REL9_3_STABLE/pgsql
> 146M REL9_4_STABLE/pgsql
> 1.1G total

> Maybe you need some git garbage collection?

Weird ... for me, dromedary and prairiedog are both showing very similar
numbers. Shouldn't GC be automatic? These machines are not running
latest and greatest git (looks like 1.7.3.1 and 1.7.9.6 respectively),
maybe that has something to do with it?

A fresh clone from git://git.postgresql.org/git/postgresql.git right
now is 167MB (using dromedary's git version), so we're both showing
some bloat over the minimum possible repo size, but it's curious that
mine is so much worse.

But the larger point is that git fetch does not, AFAICT, have the same
kind of optimization that git clone does to do hard-linking when copying
an object from a local source repo. With or without GC, the resulting
duplicative storage is going to be the dominant effect after awhile on a
machine tracking a full set of branches.

> An alternative would be to remove the pgsql directory at the end of the
> run and thus do a complete fresh checkout each run. As you say it would
> cost some time but save some space. At least it would be doable as an
> option, not sure I'd want to make it non-optional.

What I was thinking is that a complete-fresh-checkout approach would
remove the need for the copy_source step that happens now, thus buying
back at least most of the I/O cost. But that's only considering the
working tree. The real issue here seems to be about having duplicative
git repos ... seems like we ought to be able to avoid that.

regards, tom lane

In response to

Responses

Browse buildfarm-members by date

  From Date Subject
Next Message Andrew Dunstan 2015-01-19 14:37:53 Re: Reducing buildfarm disk usage: remove temp installs when done
Previous Message Andrew Dunstan 2015-01-19 04:10:27 Re: Reducing buildfarm disk usage: remove temp installs when done

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2015-01-19 05:38:47 Re: pg_rewind in contrib
Previous Message Craig Ringer 2015-01-19 05:03:44 Re: [PATCH] server_version_num should be GUC_REPORT