Re: Mnogosearch (Was: Re: website doc search is ... )

From: Arjen van der Meijden <acmmailing(at)vulcanus(dot)its(dot)tudelft(dot)nl>
To: "Marc G(dot) Fournier" <scrappy(at)postgresql(dot)org>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Mnogosearch (Was: Re: website doc search is ... )
Date: 2004-01-01 19:14:49
Message-ID: 3FF471A9.8010806@vulcanus.its.tudelft.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Marc G. Fournier wrote:

> On Thu, 1 Jan 2004, Arjen van der Meijden wrote:
>
>
>>Marc G. Fournier wrote:
>>
>>>Now, if I knock off the LIKE, so that I'm returning all rows from ndict8,
>>>join'd to all the URLs that contain them, you get:
>>
>>Can't you build seperate databases for each domain you want to index?
>>Than you wouldn't need the like operator at all.
>
>
> First off, that would make searching across multiple domains difficult,
> no?
If mnogosearch would allow searching in multiple databases; no. But it
doesn't seem to feature that and indeed; yes that might become a bit
difficult.
It was something I thought of because our solution allows it, but that
is no solution for you, I checked the mnogosearch features after sending
that email, instead of before. Perhaps I should've turned that around.

> Second, the LIKE is still required ... the LIKE allows the search to
> "group" URLs ... for instance, if I wanted to just search on the docs, the
> LIKE would look for all URLs that contain:
>
> http://www.postgresql.org/docs/%%
>
> whereas searching the whole site would be:
>
> http://www.postgresql.org/%%
That depends. If it were possible, you could decide from the search
usage stats to split /docs from the "the rest" of www.postgresql.org and
by that avoiding quite a bit of like's.

>>Anyway, that doesn't help you much, perhaps decreasing the size of the
>>index-tables can help, are they with OIDs ? If so, wouldn't it help to
>>recreate them without, so you save yourselves 4 bytes per word-document
>>couple, therefore allowing it to fit in less pages and by that speeding
>>up the seqscans.
>
>
> This one I hadn't thought about ... for some reason, I thought that
> WITHOUT OIDs was now the default ... looking at that one now ...
No, it's still the default to do it with oids.

>>By the way, can a construction like (tablefield || '') ever use an index
>>in postgresql?
>
>
> again, as shown in a previous email, the index is being used for the LIKE
> query ... the big problem as I see it is that the result set from the LIKE
> is ~20x larger then the result set for the the = ... if there was some way
> to telling the planner that going the LIKE route was the more expensive of
> the two (even though table size seems to indicate the other way around), I
> suspect that that would improve things also ...
Yeah, I noticed. Hopefully Tom's suggestion will work to achieve that.

I can imagine how you feel about all this, I had to do a similar job a
year ago, but was less restricted by a preference like the "it'd be a
nice postgresql showcase". But then again, our search engine is loaded
with an average of 24 queries per minute (peaking to over 100/m in the
afternoon and evenings) and we didn't have any working solution (not
even a slow one).

Good luck,

Arjen van der Meijden

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2004-01-01 19:17:50 Re: Mnogosearch (Was: Re: website doc search is ... )
Previous Message Marc G. Fournier 2004-01-01 19:10:00 Mnogosearch: Comparing PgSQL 7.4 to MySQL 4.1