Re: Search (was: Web team meeting minutes)

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Dave Page <dpage(at)vale-housing(dot)co(dot)uk>
Cc: Magnus Hagander <mha(at)sollentuna(dot)net>, pgsql-www(at)postgresql(dot)org
Subject: Re: Search (was: Web team meeting minutes)
Date: 2006-07-14 13:21:55
Message-ID: Pine.GSO.4.63.0607141710490.2921@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-www

Dave,

I see the main problem is not in search engine, but in the site engine !
It's just not database driven. So, I withdraw my words :)
Does web team consider changing web site engine ? I suggest not to use
home-made engines, since we have no power to support it, we do database
development, and we don't want to depend on specific person. There are
big open-source projects with stable, mature community and we could
just add fts capability we need, for example, to Drupal.

Oleg
On Fri, 14 Jul 2006, Dave Page wrote:

>
>
>> -----Original Message-----
>> From: Oleg Bartunov [mailto:oleg(at)sai(dot)msu(dot)su]
>> Sent: 14 July 2006 13:48
>> To: Dave Page
>> Cc: Magnus Hagander; pgsql-www(at)postgresql(dot)org
>> Subject: RE: [pgsql-www] Web team meeting minutes
>>
>> I just wanted to say, that current search is not designed for
>> Web site indexing.
>
> Err, from the site:
>
> ASPseek is an Internet search engine software developed by SWsoft and
> licensed as free software under GNU GPL.
>
> ASPseek consists of an indexing robot, a search daemon, and a CGI search
> frontend. It can index as many as a few million URLs and search for
> words and phrases, use wildcards, and do a Boolean search. Search
> results can be limited to time period given, site or Web space (set of
> sites) and sorted by relevance (PageRank is used) or date.
>
>> Search, for example, latest news title "Open Technology
>> Group, Inc. announces
>> plPHP training" and you'll get nothing ! And will not be
>> searched until new
>> index gets build. This is exactly why we've developed
>> tsearch2 - online
>> indexing. If documents are in database, then requirement is just setup
>> tsearch2, if not - then you need sort of openfts.
>
> Actually our port of Aspseek can do online indexing - John added an XML
> feed in which you can directly insert index data (he used to use it to
> accept catalogue feeds from online resellers iirc). The problem is that
> we don't have any way to stream the data off the website in that way, so
> we still end up crawling anyway.
>
> I do appreciate your point though, and if anyone can come up with a way
> to stream data from the website (perhaps just as part of the static
> build process) then it might be worth looking at. Archives would have
> the same problems I guess - whilst it would be easy enough index mail
> messages online, you have no way of knowing what the URL on
> archives.postgresql.org would be at that point, unless we fundamentally
> redesigned the entire archives site to run from the database.
>
> Regards, Dave.
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

In response to

Responses

Browse pgsql-www by date

  From Date Subject
Next Message Magnus Hagander 2006-07-14 13:26:18 Re: Search (was: Web team meeting minutes)
Previous Message Magnus Hagander 2006-07-14 13:03:11 Re: Font for headers