Re: A counter productive conversation about search.

From: "Dave Page" <dpage(at)vale-housing(dot)co(dot)uk>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "PostgreSQL WWW" <pgsql-www(at)postgresql(dot)org>
Subject: Re: A counter productive conversation about search.
Date: 2006-08-29 07:45:10
Message-ID: E7F85A1B5FF8D44C8A1AF6885BC9A0E40154C88D@ratbert.vale-housing.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-www

> -----Original Message-----
> From: pgsql-www-owner(at)postgresql(dot)org
> [mailto:pgsql-www-owner(at)postgresql(dot)org] On Behalf Of Joshua D. Drake
> Sent: 29 August 2006 04:12
> To: PostgreSQL WWW
> Subject: [pgsql-www] A counter productive conversation about search.
>
> Hello,
>
> Now that I have effectively slapped myself silly by being rude to Tom
> about search. Let me bring up some points about search and
> see if there
> is a way to resolve them.
>
> The problem:
>
> Search really isn't that good. Tom has good results with it, but I am
> guessing that because he is looking for specific things,
> likely just in
> archives as I doubt he often searches the documentation ;).
>
> A quick search on google:
>
> site:archives.postgresql.org index bloat
>
> archives.postgresql.org/pgsql-performance/2005-04/msg00617.php
> archives.postgresql.org/pgsql-performance/2005-04/msg00594.php
> archives.postgresql.org/pgsql-performance/2005-04/msg00608.php
>
> archives.postgresql.org:
>
> http://archives.postgresql.org/pgsql-performance/2005-04/msg00575.php
> http://archives.postgresql.org/pgsql-general/2004-12/msg00288.php
> http://archives.postgresql.org/pgsql-general/2005-07/msg00186.php
>
> site:www.postgresql.org create index
> www.postgresql.org/docs/7.4/static/sql-createindex.html
> www.postgresql.org/docs/8.1/static/sql-createindex.html
> www.postgresql.org/files/documentation/books/aw_pgsql/node216.html
>
> search.postgresql.org:
> http://www.postgresql.org/files/documentation/books/aw_pgsql/n
> ode216.html
> http://www.postgresql.org/files/documentation/books/pghandbuch
> /html/sql-createindex.html
> http://developer.postgresql.org/~petere/past-events/lsm2003-sl
> ides/foil20.html
>
> The first search is "reasonable" between the two, although it
> does not
> appear to correctly follow the thread path.

The search engine has no site specific knowledge - it (like any other
generic search engine) simply doesn't know about threading.

> The second search to me is completely wrong. CREATE INDEX
> should always
> return the current documentation first. I can forgive google
> for showing
> 7.4 first because it has been around longer and yet is still
> widely in use.

That should be fixable by tweaking weighting values, however last time I
suggested that I got shot down.

> I have on multiple occasions brought up the idea of another search
> engine. I wrote the pgsql.ru guys and asked if they would share their
> code. To their benefit they said they would be willing but
> didn't have
> the time to install it for us. I told them I would be happy to muscle
> through it if they would just answer some emails. I never heard back.
>
> Other options include lucene, and rolling our own.

Is Lucene capable of handling the size of our index? This has always
been the problem we've had with other projects like MnogoSearch. They
work well until you load them up with the archives after which they
simply can't cope without ridiculous amounts of hardware.

> Rolling our own really wouldn't be that hard "if" we can create a
> reasonably smart web page grabber. We have all the tools
> (tsearch2 and
> pg_pgtrm) to easily do the searches.
>
> So is anyone up for helping develop a page grabber?

We have one - it builds the static version of the main site by spidering
it hourly.

Regards, Dave.

In response to

Responses

Browse pgsql-www by date

  From Date Subject
Next Message Lukas Kahwe Smith 2006-08-29 07:46:52 PostgreSQL rebranding
Previous Message Dave Page 2006-08-29 07:35:57 Re: Search out of sync