Excerpts from Joe Abbate's message of mar may 31 10:43:07 -0400 2011:
> I have a web crawler for a website I maintain that I could modify to
> crawl through the archives of -bugs, say from 5 Dec 2003 where the first
> bug with the new format appears, and capture the structured data
> (reference, logged by, email address, PG version, OS, description, and
> message URL) into a table, for every message whose subject starts with
> "BUG #", and capture each message URL for any message that has "BUG #"
> somewhere in the subject, in a second table.
> I presume the tables could be used even if it's decided to go with
> something like RT or BZ, but before I spend a couple of hours on this
> I'd like see some ayes or nays. Useful or not?
I think this would be easier if you crawled the monthly mboxen instead
of the web archives. It'd be preferable to use message-ids to identify
messages rather than year-and-month based URLs.
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
In response to
pgsql-hackers by date
|Next:||From: Robert Haas||Date: 2011-05-31 15:39:03|
|Subject: Re: switch UNLOGGED to LOGGED|
|Previous:||From: Alvaro Herrera||Date: 2011-05-31 15:22:52|
|Subject: Re: [HACKERS] DOCS: SGML identifier may not exceed 44 characters|