Re: Mnogosearch (Was: Re: website doc search is ... )

From: "Marc G(dot) Fournier" <scrappy(at)postgresql(dot)org>
To: Arjen van der Meijden <acmmailing(at)vulcanus(dot)its(dot)tudelft(dot)nl>
Cc: "Marc G(dot) Fournier" <scrappy(at)postgresql(dot)org>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Mnogosearch (Was: Re: website doc search is ... )
Date: 2004-01-01 17:46:56
Message-ID: 20040101133802.H913@ganymede.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, 1 Jan 2004, Arjen van der Meijden wrote:

> Marc G. Fournier wrote:
> >
> > Now, if I knock off the LIKE, so that I'm returning all rows from ndict8,
> > join'd to all the URLs that contain them, you get:
>
> Can't you build seperate databases for each domain you want to index?
> Than you wouldn't need the like operator at all.

First off, that would make searching across multiple domains difficult,
no?

Second, the LIKE is still required ... the LIKE allows the search to
"group" URLs ... for instance, if I wanted to just search on the docs, the
LIKE would look for all URLs that contain:

http://www.postgresql.org/docs/%%

whereas searching the whole site would be:

http://www.postgresql.org/%%

> Anyway, that doesn't help you much, perhaps decreasing the size of the
> index-tables can help, are they with OIDs ? If so, wouldn't it help to
> recreate them without, so you save yourselves 4 bytes per word-document
> couple, therefore allowing it to fit in less pages and by that speeding
> up the seqscans.

This one I hadn't thought about ... for some reason, I thought that
WITHOUT OIDs was now the default ... looking at that one now ...

> Are _all_ your queries with the like on the url? Wouldn't it help to
> create an index on both the wordid and the urlid for ndict8?

as mentioned in a previous email, the schema for ndict8 is:

186_archives=# \d ndict8
Table "public.ndict8"
Column | Type | Modifiers
---------+---------+--------------------
url_id | integer | not null default 0
word_id | integer | not null default 0
intag | integer | not null default 0
Indexes:
"n8_url" btree (url_id)
"n8_word" btree (word_id)

> By the way, can a construction like (tablefield || '') ever use an index
> in postgresql?

again, as shown in a previous email, the index is being used for the LIKE
query ... the big problem as I see it is that the result set from the LIKE
is ~20x larger then the result set for the the = ... if there was some way
to telling the planner that going the LIKE route was the more expensive of
the two (even though table size seems to indicate the other way around), I
suspect that that would improve things also ...

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy(at)hub(dot)org Yahoo!: yscrappy ICQ: 7615664

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2004-01-01 17:47:19 Re: speeding up inserts
Previous Message Casey Allen Shobe 2004-01-01 17:43:44 Re: Is my MySQL Gaining ?