From: | Magnus Hagander <magnus(at)hagander(dot)net> |
---|---|
To: | "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL www <pgsql-www(at)postgresql(dot)org> |
Subject: | Re: Email search failure |
Date: | 2008-08-01 12:39:02 |
Message-ID: | 489303E6.40508@hagander.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-www |
Joshua D. Drake wrote:
> Tom Lane wrote:
>> Bruce Momjian <bruce(at)momjian(dot)us> writes:
>>> Why is the email below now appearing in a search?
>>
>> Probably because nothing has gotten indexed for a month or more.
>> Whoever is supposed to maintain the archive indexer has been
>> on vacation since it broke ...
>
> That would be Magnus and you are correct. He just got back. The problem
> (last I checked) is an issue with Russian emails.
Looking at it now. That clearly wasn't the only problem, because there
was a "sleep 1800" process that had been running since July 3. Logfiles
weren't touched etc. Just restarting it fixed that part, which clearly
somebody else could've done as well ;)
I found the bug with the Russian emails, btw. It seems mhonarc encoded
the invalid UTF8 sequences inside valid HTML escape entities And the
code applied the "fix broken UTF8" logic *before* it unescaped the HTML
entities. Now it does it both before and after..
Oh, and this should never have affected messages on -hackers for
example, because it was always processed before ru-general. It would hit
the PUG lists, -www, -patches and a few others.
//Magnus
From | Date | Subject | |
---|---|---|---|
Next Message | dpage | 2008-08-01 13:23:39 | Re: Email search failure |
Previous Message | Joshua D. Drake | 2008-08-01 03:15:59 | Re: Email search failure |