Re: repository size differences

From: Aidan Van Dyk <aidan(at)highrise(dot)ca>
To: Abhijit Menon-Sen <ams(at)toroid(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: repository size differences
Date: 2010-09-22 14:11:19
Message-ID: AANLkTinoW+CKaOc_A5PMGdhRAFns6VeYPxJARi8QY+cH@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 21, 2010 at 10:32 PM, Abhijit Menon-Sen <ams(at)toroid(dot)org> wrote:

> That's not it. I ran the same git gc command on my old repository, and
> it didn't make any difference to the size. (I didn't try with a larger
> window size, though.)

Probably lots of it has to do with the delta chains themselves. The
old repository was an "incremental" conversion, so each new delta (as
it's added) has only (and all) "repository wide" objects to look at
for choosing a base. git has some limits and hueristics on deciding
"how far and wide" to look for the best delta base.

The cvs2* scripts are more direct, they first reference the files,
then commit graph, etc, so all revisions of a particular file are
added before moving on to the next. This means that all previous
versions of a file are likely "hot" in the path git will look for the
best fit delta. By changing the order of how the objects are added to
the git repository, it makes it easier for git to find the best/better
delta bases.

You can adjust the "delta window" git-repack uses, see the man page
for git-repack, and git-gc. If you're willing to do a monster repack
on the old repository (using a *huge* window) you can likely get it
close in size.

a.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2010-09-22 14:19:03 Re: Standby registration
Previous Message Magnus Hagander 2010-09-22 14:03:23 Git cvsserver serious issue