Re: git: uh-oh

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-25 08:18:01
Message-ID: AANLkTimy-PxHvAMEQWgvmpMe_732fYCuVX8nhTTRO1rJ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 25, 2010 at 07:11, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> 1. The new conversion seems to have stolen the apostrophe from "D'Arcy
>> J.M. Cain <darcy(at)druid(dot)net>", rendering him "DArcy J.M. Cain
>> <darcy(at)druid(dot)net>".
>
> Yeah, I see that too.  It's probably bad input rather than the
> converter's fault ;-)

indeed. Wrong type of escaping. For some reason I used '' when I
should've used \'. I wonder where I got that idea :D

>> 2. Any non-ASCII characters in, for example, contributor's names show
>> up differently in the two repos.  Generally, the original repo is OK
>> and the new repo is garbled; although I found one very old example
>> that went the other way.
>
> What it looks like to me is that a Latin1->UTF8 conversion has been
> applied to the log text.  Which might be a good idea if it all *was*
> Latin1, but a fair-sized percentage isn't.  Applying this conversion to
> UTF8 entries results in garbage, of course.  Even if this could be done
> reliably, I think this counts as editorializing on the historical
> record, and should be switched off if possible.

I think the problem is that we have a mix of them :( git requires it to be utf8.

cvs2git is configured to try, in order, latin1, utf8 and ascii, and
use whichever first returns correct result. In this case it seems it
does return saying things are right, because the result is valid utf8
- just not the utf8 we expected.

I can give it a try the other way around - trying utf8 *before*
latin1, to see if that makes it better - utf8 tends to be more strict.

>> There are also a number of commits that differ in order between the
>> two repos, and an even larger number where commits are duplicated or
>> merged in one repository relative to the other.
>
> I suspect that this is an artifact of the converter trying to merge
> nearby commits into one commit, which it more or less *has* to do for
> sanity since CVS commits aren't atomic.  I don't have a problem with
> the concept, but I notice cases where the converted commit has a
> timestamp some minutes later than what the cvs2cl output claims.
> I suspect this is what the converter was using as a cutoff time.
> Would it be possible to make sure that the converted commit is always
> timestamped with the latest individual file update timestamp from the
> included CVS commits?

I can't comment o nthis part - Michael or Max?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2010-08-25 09:09:13 Re: Backups from the standby (Incrementally Updated Backups), open item
Previous Message Fujii Masao 2010-08-25 08:14:54 Re: Backups from the standby (Incrementally Updated Backups), open item