Re: Bricolage: Impressive

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Dave Page <dpage(at)vale-housing(dot)co(dot)uk>
Cc: David Wheeler <david(at)kineticode(dot)com>, "Marc G(dot) Fournier" <scrappy(at)postgresql(dot)org>, Steve Simms <steve(at)deefs(dot)net>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL Web Development Mailing List <pgsql-www(at)postgresql(dot)org>
Subject: Re: Bricolage: Impressive
Date: 2004-01-19 16:10:34
Message-ID: Pine.GSO.4.58.0401191629000.3310@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-www

On Mon, 19 Jan 2004, Dave Page wrote:

>
>
> > -----Original Message-----
> > From: Oleg Bartunov [mailto:oleg(at)sai(dot)msu(dot)su]
> > Sent: 19 January 2004 11:55
> > To: David Wheeler
> > Cc: Marc G. Fournier; Steve Simms; Peter Eisentraut;
> > PostgreSQL Web Development Mailing List
> > Subject: Re: [pgsql-www] Bricolage: Impressive
> >
> >
> > This will solve .postgresql.org problem with search engine if
> > -www will decide to go with Bricolade.
>
> Umm, how? The current problem is that mnogosearch cannot cope with
> 300,000 plus pages, even in cache mode (a problem I'm actively working
> on right now). Are you suggesting that all the archives get published
> via Bricolage as well?

Not, of course :) I suggest to separate different things -
official information, which comes through editorial board and archives.
I meant Bricolade with built in search to use for former type of sites.

Mnogosearch + PostgreSQL is not the best combination and it has no
proximity ranking, so releavation of search results are bad. Reindexing
require rebuilding indices, so, in my opinion you waste your time.
It has no future. Also, crawling archives is also waste of time and
resources. Let's see, if you have access to local files, why do you need
crawling ? From my experience crawling archives.postgresql.org takes a lot
of time just because of each page generates up to 3 seconds !
Also, when your crawl archives you crawl web page, which contains a lot
of web thingies (navigation, advertisement,....). Do you need that ?
It's much better to index just postings. On fts.postgresql.org we had
light version of our CMS, we optimize for automated processing of postings,
so we could apply specialized parser and obtain metadata. That's what we'll
use on www.pgsql.ru for mailing list archives. It was planned to be released
really soon, but delayed because of some urgent works. And this archive
will be based fully on PostgreSQL, in very natural way :)

Oleg

>
> Regards, Dave.
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

In response to

Browse pgsql-www by date

  From Date Subject
Next Message Jan Wieck 2004-01-19 16:50:28 Re: [press] UltraSQL for Windows released
Previous Message Robert Treat 2004-01-19 15:32:09 Re: Entry for the front page?