Re: mailing list archiver chewing patches

From: Aidan Van Dyk <aidan(at)highrise(dot)ca>
To: Matteo Beccati <php(at)beccati(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mailing list archiver chewing patches
Date: 2010-01-12 20:16:47
Message-ID: 20100112201647.GC18076@oak.highrise.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-www


I'll note that the whole idea of a "email archive" interface might be a
very good "advocacy" project as well. AOX might not be a perfect fit,
but it could be a good learning experience... Really, all the PG mail
archives need is:

1) A nice normalized DB schema representing mail messages and their
relations to other message and "recipients" (or "folders")

2) A "injector" that can parse an email message, and de-compose it into
the various parts/tables of the DB schema, and insert it

3) A nice set of SQL queries to return message, parts, threads,
folders based on $criteria (search, id, folder, etc)

4) A web interface to view the messages/thread/parts #3 returns

The largest part of this is #1, but a good schema would be a very good
candidate to show of some of PG's more powerful features in a way that
"others" could see (like the movie store sample somewhere) , such as:
1) full text search
2) text vs bytea handling (thinking of all the mime parts, and encoding,
etc)
3) CTEs, ltree, recursion, etc, for threading/searching
4) Triggers for "materialized views" (for quick threading/folder queries)
5) expression indexes

a.

* Matteo Beccati <php(at)beccati(dot)com> [100112 14:56]:

> Having played with it, here's my feedback about AOX:
>
> pros:
> - seemed to be working reliably;
> - does most of the dirty job of parsing emails, splitting parts, etc
> - highly normalized schema
> - thread support (partial?)
>
> cons:
> - directly publishing the live email feed might not be desirable
> - queries might end up being a bit complicate for simple tasks
> - might be not easy to add additional processing in the workflow

> If there isn't a fully usable thread hierarchy I was more thinking to
> ltree, mainly because I've successfully used it in past and I haven't
> had enough time yet to look at CTEs. But if performance is comparable I
> don't see a reason why we shouldn't use them.

> With all that said, I can't promise anything as it all depends on how
> much spare time I have, but I can proceed with the evaluation if you
> think it's useful. I have a feeling that AOX is not truly the right tool
> for the job, but we might be able to customise it to suit our needs. Are
> there any other requirements that weren't specified?

--
Aidan Van Dyk Create like a god,
aidan(at)highrise(dot)ca command like a king,
http://www.highrise.ca/ work like a slave.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Matteo Beccati 2010-01-12 20:37:50 Re: mailing list archiver chewing patches
Previous Message Bruce Momjian 2010-01-12 20:11:54 Re: Streaming replication status

Browse pgsql-www by date

  From Date Subject
Next Message Matteo Beccati 2010-01-12 20:37:50 Re: mailing list archiver chewing patches
Previous Message Magnus Hagander 2010-01-12 20:04:27 Re: mailing list archiver chewing patches