Re: git: uh-oh

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Max Bowsher <maxb(at)f2s(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-25 11:19:16
Message-ID: AANLkTimvisHdcj9amRX8YuY=0ycid++zHpp1aam_+3+s@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 24, 2010 at 11:21 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Fri, Aug 20, 2010 at 1:56 PM, Max Bowsher <maxb(at)f2s(dot)com> wrote:
>>> My guess at this point is that there may be a (very old?) version of cvs
>>> which, when adding a file to a branch, actually misrecorded the file as
>>> having existed on the branch from the moment it was first added to trunk
>>> - this would explain this anomaly.
>
>> I think this is what is happening, except I'm unable to account for it
>> by the age of the CVS version we're runnning.  The machine the CVS
>> repo is running on is running 1.11.17-FreeBSD (client/server).
>
> Um, how old do you think that is?  A look at the cvs sources says 2004...

Oh, really? I didn't look that carefully; I just checked the date on
the download directory, which was 2008. But I guess the actual code
is older.

>> The odder cases are the ones involving deletion.  There are a couple
>> of branches/tags that, or so I'm guessing, are only present for a
>> subset of the files in the repository: ecpg_big_bison, creation,
>> Release-1-6-0, MANUAL_1_0, REL2_0B, and SUPPORT.  I'm wondering if we
>> shouldn't just nuke those, or at least nuke them from the copy of the
>> repository upon which we are running the conversion.
>
> Yeah, I noticed some of those in my copy of the test repository too,
> but I see a slightly different set:
>
>  remotes/origin/REL2_0B
>  remotes/origin/REL6_4
>  remotes/origin/Release_1_0_3
>  remotes/origin/WIN32_DEV
>  remotes/origin/ecpg_big_bison
>
> I doubt they're of any more than archaeological interest, but do we want
> to be deleting history?

Well, I think what those represent are partial tags. git has no
equivalent, so anything that pops out this way is going to be totally
wacko. We're not really deleting history; we're just declining to
convert things that git can't represent accurately. It is sort of an
interesting question why REL6_4 would fall into this category, but I
can't imagine we care about any of the other ones. And if we do,
well, we're not deleting the CVS tree.

> What seemed more likely to be artifacts were
> these:
>
>  remotes/origin/unlabeled-1.44.2
>  remotes/origin/unlabeled-1.51.2
>  remotes/origin/unlabeled-1.59.2
>  remotes/origin/unlabeled-1.87.2
>  remotes/origin/unlabeled-1.90.2
>
> Any idea where those came from?

No; I don't see anything like that. What command did you run?

>> This series of commits also seems pretty messed up:
>> http://archives.postgresql.org/pgsql-committers/2007-04/msg00222.php
>> http://archives.postgresql.org/pgsql-committers/2007-04/msg00223.php
>
> You can find out about the reasons for that in this *other* discussion
> of conversion to git:
> http://archives.postgresql.org/pgsql-hackers/2007-04/msg00670.php
> particularly here:
> http://archives.postgresql.org/pgsql-hackers/2007-04/msg00685.php
>
>> ... pretty crazy.  I think we should try to do something to clean this up,
>> perhaps by doctoring the file on the CVS side.
>
> On the whole I feel that you're moving the goalposts.  AFAIR the agreed
> criteria for an acceptable SCM conversion were that it reproduce the
> historical states of our tree at least at all the release tags, and that
> it provide a close approximation of the CVS commit logs.  I think that
> manufactured commits that correspond to CVS's artifacts might be a bit
> ugly, but trying to get rid of them sounds way too much like putting
> lipstick on a pig.  And if it means removing real, if ugly, history,
> I'm not sure I'm in favor of it at all.

Well, when did it become a goal to get this git conversion done as
soon as humanly possible? We *cannot* retroactively fix these issues
after the conversion is done; or at least not without rewriting the
entire repository history, which is something we do not want to do
lightly - it is a major inconvenience for anyone who has already
cloned, and particularly for, ahem, any companies that might be
merging off of the repo. I don't think we should decide that we're
unwilling to fix these issues without even discussing whether that's
feasible or what would be involved. I don't think we're talking about
removing history; I think we're talking about cleaning up corruption
in CVS that will be irretrievably baked-in by the conversion.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-08-25 11:26:42 Re: trace_recovery_messages
Previous Message Magnus Hagander 2010-08-25 11:15:53 Re: git: uh-oh