Re: mailing list archiver chewing patches

From: Matteo Beccati <php(at)beccati(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Dave Page <dpage(at)pgadmin(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, jd <jd(at)commandprompt(dot)com>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-12 19:58:09
Message-ID: 4B4CD451.200@beccati.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-www

Il 12/01/2010 19:54, Magnus Hagander ha scritto:
> On Tue, Jan 12, 2010 at 18:34, Dave Page<dpage(at)pgadmin(dot)org> wrote:
>> On Tue, Jan 12, 2010 at 10:24 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> "Joshua D. Drake"<jd(at)commandprompt(dot)com> writes:
>>>> On Tue, 2010-01-12 at 10:24 +0530, Dave Page wrote:
>>>>> So just to put this into perspective and give anyone paying attention
>>>>> an idea of the pain that lies ahead should they decide to work on
>>>>> this:
>>>>>
>>>>> - We need to import the old archives (of which there are hundreds of
>>>>> thousands of messages, the first few years of which have, umm, minimal
>>>>> headers.
>>>>> - We need to generate thread indexes
>>>>> - We need to re-generate the original URLs for backwards compatibility
>>>>>
>>>>> Now there's encouragement :-)
>>>
>>>> Or, we just leave the current infrastructure in place and use a new one
>>>> for all new messages going forward. We shouldn't limit our ability to
>>>> have a decent system due to decisions of the past.
>>>
>>> -1. What's the point of having archives? IMO the mailing list archives
>>> are nearly as critical a piece of the project infrastructure as the CVS
>>> repository. We've already established that moving to a new SCM that
>>> fails to preserve the CVS history wouldn't be acceptable. I hardly
>>> think that the bar is any lower for mailing list archives.
>>>
>>> Now I think we could possibly skip the requirement suggested above for
>>> URL compatibility, if we just leave the old archives on-line so that
>>> those URLs all still resolve. But if we can't load all the old messages
>>> into the new infrastructure, it'll basically be useless for searching
>>> purposes.
>>>
>>> (Hmm, re-reading what you said, maybe we are suggesting the same thing,
>>> but it's not clear. Anyway my point is that Dave's first two
>>> requirements are real. Only the third might not be.)
>>
>> The third actually isn't actually that hard to do in theory. The
>> message numbers are basically the zero-based position in the mbox
>> file, and the rest of the URL is obvious.
>
> The third part is trivial. The search system already does 95% of it.
> I've already implemented exactly that kind of redirect thing on top of
> the search code once just as a poc, and it was less than 30 minutes of
> hacking. Can't seem to find the script ATM though, but you get the
> idea.
>
> Let's not focus on that part, we can easily solve that.

Agreed. That's the part that worries me less.

Cheers
--
Matteo Beccati

Development & Consulting - http://www.beccati.com/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2010-01-12 20:04:27 Re: mailing list archiver chewing patches
Previous Message Matteo Beccati 2010-01-12 19:56:29 Re: mailing list archiver chewing patches

Browse pgsql-www by date

  From Date Subject
Next Message Magnus Hagander 2010-01-12 20:04:27 Re: mailing list archiver chewing patches
Previous Message Matteo Beccati 2010-01-12 19:56:29 Re: mailing list archiver chewing patches