Re: git: uh-oh

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-18 07:56:37
Message-ID: AANLkTim9Gp5CxL1GeOPkJH33eQZvfBhWDQE0rhYeUxiB@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 18, 2010 at 08:25, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu> wrote:
> Tom Lane wrote:
>> I lack git-fu pretty completely, but I do have the CVS logs ;-).
>> It looks like some of these commits that are being ascribed to the
>> REL8_3_STABLE branch were actually only committed on HEAD.  For
>> instance my commit in contrib/xml2 on 28 Feb 2010 21:31:57 was
>> only in HEAD.  It was back-patched a few hours later (1 Mar 3:41),
>> and that's also shown here, but the HEAD commit shouldn't be.
>>
>> I wonder whether the repository is completely OK and the problem
>> is that this webpage isn't filtering the commits correctly.
>
> Please don't panic :-)

We're not panic'ing just yet :-)

> The problem is that it is *impossible* to faithfully represent a CVS or
> Subversion history with its ancestry information in a git repository (or
> AFAIK any of the DVCS repositories).  The reason is that CVS
> fundamentally records the history of single files, and each file can
> have a branching history that is incompatible with those of other files.
>  For example, in CVS, a file can be added to a branch after the branch
> already exists, different files can be added to a branch from multiple
> parent branches, and even more perverse things are allowed.  The CVS
> history can record this mish-mash (albeit with much ambiguity).

It can. IIRC we have cleaned a couple of such things out.

<snip some good descriptions of how git works>

> Given the choice between two wrong histories, cvs2git uses the
> "inclusive" style.  The result is that the ancestors of B4 include not
> only T0, T1, B1, B2, and B3 (as might be expected), but also T2 and T3.
>  The display in the website that was quoted [2] seems to mash all of the
> ancestors together without showing the topology of the history, making
> the result quite confusing.  The true history looks more like this:
>
> $ git log --oneline --graph REL8_3_10 master
> [...]
> | * 2a91f07 tag 8.3.10
> | * eb1b49f Preliminary release notes for releases 8.4.3, 8.3
> | * dcf9673 Use SvROK(sv) rather than directly checking SvTYP
> | * 1194fb9 Update time zone data files to tzdata release 201
> | * fdfd1ec Return proper exit code (3) from psql when ON_ERR
> | * 77524a1 Backport fix from HEAD that makes ecpglib give th
> | * 55391af Add missing space in example.
> | * 982aa23 Require hostname to be set when using GSSAPI auth
> | * cb58615 Update time zone data files to tzdata release 201
> | * ebe1e29 When reading pg_hba.conf and similar files, do no
> | * 5a401e6 Fix a couple of places that would loop forever if
> | * 5537492 Make contrib/xml2 use core xml.c's error handler,
> | * c720f38 Export xml.c's libxml-error-handling support so t
> | * 42ac390 Make iconv work like other optional libraries for
> | * b03d523 pgindent run on xml.c in 8.3 branch, per request
> | * 7efcdaa Add missing library and include dir for XSLT in M
> | * 6ab1407 Do not run regression tests for contrib/xml2 on M
> | * fff18e6 Backpatch MSVC build fix for XSLT
> | * 7ae09ef Fix numericlocale psql option when used with a nu
> | * de92a3d Fix contrib/xml2 so regression test still works w
> | *   80f81c3 This commit was manufactured by cvs2svn to crea
> | |\
> | |/
> |/|
> * | a08b04f Fix contrib/xml2 so regression test still works w
> * | 0d69e0f It's clearly now pointless to do backwards compat
> * | 4ad348c Buildfarm still unhappy, so I'll bet it's EACCES
> * | 6e96e1b Remove xmlCleanupParser calls from contrib/xml2.
> * | 5b65b67 add EPERM to the list of return codes to expect f
> | * a4067b3 Remove xmlCleanupParser calls from contrib/xml2.
> | * 91b76a4 Back-patch today's memory management fixups in co
> | * 5e74f21 Back-patch changes of 2009-05-13 in xml.c's memor
> | *   043041e This commit was manufactured by cvs2svn to crea
> | |\
> | |/
> |/|
> * | 98cc16f Fix up memory management problems in contrib/xml2
> * | 17e1420 Second try at fsyncing directories in CREATE DATA
> * | a350f70 Assorted code cleanup for contrib/xml2.  No chang
> * | 3524149 Update complex locale example in the documentatio
> [...]
>
> The left branch is master, the right branch is the one leading to
> REL8_3_10.  You can see that there are multiple merges from master to
> the branch, presumably when new files from trunk were ported to the
> branch.  This is even easier to see using a graphical history browser
> like gitk.

Yeah, this is clearly the problem.

> There are good arguments for both the "inclusive" and the "exclusive"
> representation of history.  The ideal would require a lot more
> intelligence and better heuristics (and slow down the conversion
> dramatically).  But even the smartest conversion would still be wrong,
> because git is simply incapable of representing an arbitrary CVS
> history.  The main practical result of the impedance mismatch is that it
> will be more difficult to merge between branches that originated in CVS
> (but that is no surprise!)

Our requirements are simple: our cvs history is linear, the git
history should be linear. It is *not* the same commit that's on head
and the branch. They are two different commits, that happen to have
the same commit message and mostly the same content.

Bottom line is, we want zero merge commits in the git repository. We
may start using that sometime in the future (but for now, we've
decided we don't want that even in the future), but we most
*definitely* don't want it in the past. We don't care about
"representing the proper heritage of FILE1" in git, because we never
did in cvs.

Is there some way to make cvs2git work this way, and just not bother
even trying to create merge commits, or is that fundamentally
impossible and we need to look at another tool?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Haggerty 2010-08-18 09:01:29 Re: git: uh-oh
Previous Message Martijn van Oosterhout 2010-08-18 06:44:26 Re: git: uh-oh