Re: Report: removing the inconsistencies in our CVS->git conversion

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>
Subject: Re: Report: removing the inconsistencies in our CVS->git conversion
Date: 2010-09-13 12:14:19
Message-ID: AANLkTi=GXhv5wyM_r5WOMf=huJaHke+ZJCoYBMVkoavm@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-www

On Sun, Sep 12, 2010 at 11:03 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I've spent much of the weekend examining the discrepancies between our CVS
> repository and the tarballs available from our FTP archives, and after
> that trying to remove infelicities in the cvs2git output.  There are a
> couple of remaining oddities that I would classify as probable cvs2git
> bugs, but an awful lot of it is inconsistencies in the CVS repository
> itself, some of which I can explain and some that I can't.  Read on for
> many boring details.

First of all, WOW, and thank you very much for putting in the time to
make this happen.

> With those changes, I am able to match all the available archival tarballs
> to various places in the CVS history.  The exact spots where they match
> are detailed in the attached "matches" file.  The file also shows the

Regrettably, all of your attachments came through as part of the
actual email, both in my GMail and in the archives. I hate
technology.

> Having completed that comparison, I then moved on to trying to get rid of
> the discrepancies in the git conversion; particularly, trying to get rid
> of the "manufactured commits".  I didn't have much success in that for the
> cases where the manufactured commit was caused by a back-branch file
> addition. [...]  We still have "manufactured" commits either
> way, but they are just cosmetic so I guess we should live with them.

I'm not really following what the history looks like here. What are
the contents (git show) of the manufactured commit?

> I also found numerous places where we'd been sloppy about placing tags.
> That explains some of the weird things cvs2git did.  In particular:
>
> * We had the already-known problem that gram.c and some other derived
> files had commits made after they should have been dead.
>
> * Bruce had transiently added those files on the WIN32_DEV branch as
> well, to general disapproval, and this seemed to also give cvs2git
> indigestion.  The attached proposed fixup script deals with this by
> deleting those revisions altogether.  This is a loss of history, but
> not one that I care about.
>
> * The HISTORY and INSTALL files have REL7_3_10 tags and should not.
> As mentioned earlier, I think this is because they were deleted after the
> original placement of that tag, and weren't correctly fixed when the
> tag was moved up to branch end a few days later.
>
> * The regression tests files recently added to contrib/xml2 have REL8_0_23
> tags.  I have no idea how that happened, because they certainly didn't
> exist when 8.0.23 was released.
>
> * There are a bunch of files that should have REL7_3_5 tags and lack them.
> They are in just a few subdirectories, so probably what happened was that
> the "cvs tag" operation was issued in an incomplete checkout tree.
>
> * Similarly, gram.c should have a release-6-3 tag and lacks it.
>
> * There are a bunch of files that have REL7_1 tags when what they should
> have are REL7_1_BETA tags.  These appear to be exactly the files that were
> deleted between the initial placement of the REL7_1 tag and Marc's later
> ex-post-facto renaming of the tag to REL7_1_BETA.  I'm guessing another
> case of "cvs tag" missing files that weren't in the checkout.
>
> * There are a number of files that lack the REL2_0 tag and REL2_0B branch,
> though they should have it according to file dates.  These appear to be
> exactly the files that were in the separate documentation repository at
> the time, so that probably tells us the mechanism for missing them.

I wonder if we should consider fixing some or all of these things on
the master CVS repository. I wouldn't be too eager to inject those
fake .0 commits for fear of breakage, but moving tags to where they
ought to have been all along seems like it might be a good thing to do
independent of git.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message KaiGai Kohei 2010-09-13 12:38:02 Re: security label support, revised
Previous Message Heikki Linnakangas 2010-09-13 12:13:10 Re: Reducing walreceiver latency with a latch

Browse pgsql-www by date

  From Date Subject
Next Message Tom Lane 2010-09-13 14:13:34 Re: Report: removing the inconsistencies in our CVS->git conversion
Previous Message Kevin Grittner 2010-09-13 10:28:54 Re: [DOCS] Doc fixes and improvements