Re: antisocial things you can do in git (but not CVS)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jonathan Corbet <corbet(at)lwn(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: antisocial things you can do in git (but not CVS)
Date: 2010-07-21 16:11:34
Message-ID: AANLkTinfeRvkRMV45i=mMD6QzXO+NKh5sCrYTL8SrW0m@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 21, 2010 at 10:49 AM, Jonathan Corbet <corbet(at)lwn(dot)net> wrote:
>> 1. Inability to cleanly and easily (and programatically) identify who
>> committed what.
>
> No, git tracks committer information separately, and it's easily
> accessible.  Dig into the grungy details of git-log and you'll see that you
> can get out just about anything you need, in any format.
>
> IMHO, vandalizing the author field would be a mistake; it's your best way
> of tracking where the patch came from and for ensuring credit in your
> changelogs.  Why throw away information?

If git had a place to store all the information we care about, that
would be fine, but it doesn't. Here's a recent attribution line I
used:

David Christensen. Reviewed by Steve Singer. Some further changes by me.

There's no "reviewer" header, and there's no concept that a patch
might have come from the author (or perhaps multiple authors), but
then have been adjusted by one or more reviewers and then frobnicated
some more by the committer. I'm not sure it's possible to create a
system that can effectively store all the ways we give credit and
attribution, but a single author line is definitely not it. How much
do I have to change a patch before I would use my own name on the
author line rather than the patch author's? A single byte? A
non-whitespace change? More than 15% of the patch? And, oh by the
way, it's the committer who writes the commit message, not the patch
author, so at an *absolute minimum* that part of the commit object
isn't coming from the original author.

>> 2. Branch and tag management.  In CVS, there are branches and tags in
>> only one place: on the server.  In git, you can have local branches
>> and tags and remote branches and tags, and you can pull and push tags
>> between servers.  If I'm working on a git repository that has branches
>> master, REL9_0_STABLE .. REL7_4_STABLE, inner_join_removal,
>> numeric_2b, and temprelnames, I want to make sure that I don't
>> accidentally push the last three of those to the authoritative
>> server... but I do want to push all the others.  Similarly I want to
>> push only the corrects subset of tags (though that should be less of
>> an issue, at least for me, as I don't usually create local tags).  I'm
>> not sure how to set this up, though.
>
> Branch push policy can be tweaked in your local config.  I'm less sure
> about tags.  It's worth noting that the kernel community does very little
> with push in general - things are much more often pulled.  That may not be
> a workflow that's suitable for postgresql, though.

Seems like we've got this one worked out, per discussion upthread.

>> 3. Merge commits.  I believe that we have consensus that commits
>> should always be done as a "squash", so that the history of all of our
>> branches is linear.  But it seems to me that someone could
>> accidentally push a merge commit, either because they forgot to squash
>> locally, or because of a conflict between their local git repo's
>> master branch and origin/master.  Can we forbid this?
>
> That seems like a terrible idea to me - why would you destroy history?
> Obviously I've missed a discussion here.  But, the first time somebody
> wants to use bisect to pinpoint a regression-causing patch, you'll wish you
> had that information there.

In any commit pattern, if I use bisect to pinpoint a regression
causing patch, I will find the commit that broke it. Whoever made
that particular commit is to blame. Full stop. I don't really care
where in the development of the patch that was eventually committed
the breakage happened, and I do not want to wade through 50 revs of
somebody's private development to find the particular place where they
made a thinko. I only care that their patch *as committed* is broken.
I don't think that non-linear history is an advantage in any
situation. It may be an unavoidable necessity if you have lots of
cross-merging between different repositories, but we don't, so for us
it's just clutter.

>> 4. History rewriting.  Under what circumstances, if any, are we OK
>> with rebasing the master?  For example, if we decide not to have merge
>> commits, and somebody does a merge commit anyway, are we going to
>> rebase to get rid of it?
>
> A good general rule of thumb is to treat publicly-exposed history as
> immutable.  As soon as you start rebasing trees you create misery for
> anybody working with those trees.
>
> If you're really set on avoiding things like merges, there are ways to set
> up scripts on the server to enforce policies.

Yep, Magnus coded it up today. It works great.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-07-21 16:22:23 documentation for committing with git
Previous Message Pavel Stehule 2010-07-21 16:08:55 Re: patch: to_string, to_array functions