Re: git: uh-oh

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-18 15:56:42
Message-ID: AANLkTikJ+9rZfHjEAS0a9cVwfitsTk2xRg-w3NfDyH+2@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 18, 2010 at 11:03 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu> writes:
>> So let's take the simplest example: a branch BRANCH1 is created from
>> trunk commit T1, then some time later another FILE1 from trunk commit T3
>> is added to BRANCH1 in commit B4.  How should this series of events be
>> represented in a git repository?
>> ...
>> The "exclusive" possibility is to ignore the fact that some of the
>> content of B4 came from trunk and to pretend that FILE1 just appeared
>> out of nowhere in commit B4 independent of the FILE1 in TRUNK:
>
>> T0 -- T1 -- T2 -------- T3 -- T4        TRUNK
>>        \
>>         B1 -- B2 -- B3 -- B4            BRANCH1
>
>> This is also wrong, because it doesn't reflect the true lineage of FILE1.
>
> Maybe not, but that *is* how things appeared in the CVS history, and
> we'd rather have a git history that looks like the CVS history than
> one that claims that boatloads of utterly unrelated commits are part
> of a branch's history.

Exactly. IMHO, the way this should work is by starting at the
beginning of time and working forward. At each step, we examine the
earliest revision of each file for which no git commit has yet been
written. From among those, we select the one with the earliest
timestamp. We then also select all other files whose most recent
unprocessed revision is nearly contemporaneous and shares the same
author and log message. From the results, we generate a commit. Then
we repeat. When we arrive at a branch point, the branch gets
processed separately from the trunk. If there is no trunk rev which
has every file at the rev where it starts on the branch, then we use
some sane algorithm to pick the best one (perhaps, the one that has
the right revs of the most files) and then insert a fixup commit on
the branch to remove the deltas and carry on as before.

> The "inclusive" possibility might be tolerable if it restricted itself
> to mentioning commits that actually touched FILE1 in between its
> addition to TRUNK and its addition to BRANCH1.  So far as I can see,
> though, cvs2git is mentioning *every* commit on TRUNK between T1 and B4
> ... not even between T3 and B4, but back to the branch point.  How can
> you possibly justify that as either sane or useful?

git can't do that. It's finding those commits by following parent
pointers from the merge commits.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Haggerty 2010-08-18 16:00:44 Re: git: uh-oh
Previous Message Magnus Hagander 2010-08-18 15:52:58 Re: git: uh-oh