Re: query very slow when enable_seqscan=on

From: Tomasz Ostrowski <tometzky(at)batory(dot)org(dot)pl>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: query very slow when enable_seqscan=on
Date: 2006-07-04 14:44:08
Message-ID: 20060704144406.GA21511@batory.org.pl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-performance

On Tue, 04 Jul 2006, Tom Lane wrote:

> I think the real problem here is that regex matching is the wrong
> tool for the job. Have you looked into a full-text index
> (tsearch2)?

So much to do with so little time...

I've briefly looked into it but:

- it's complicated;

- it is not needed - basic scan is good enough for the amount of data
we have (if a sane query plan is chosen by a database);

- we have data in many languages (including based on cyryllic
alphabet) - languages which use different forms of the same word
based on context, for example:
Warszawa
Warszawy
Warszawie
Warszawę
Warszawą
Warszawo
All of the above could be translated to "Warsaw". So we need to
support matching parts of words ("warszaw"), which I haven't seen
in tsearch2 (maybe I've overlooked). We also have words, which
different forms look like this: "stół" "stole" "stołu" (Polish for
"table") - when we need to find it we'd need to list every possible
form (about 10) or use a regex like: 'st[oó][lł]'.

> With something like that, the index operator has at least got the
> correct conceptual model, ie, looking for indexed words. I'm not sure
> if they have any decent statistical support for it :-( but in theory
> that seems doable, whereas regex estimation will always be a crapshoot.

So why estimate regex expressions if there is no estimation possible?
Let's set this estimate to be pessimistic (match everything or
everything not null) and it will choose better plans. At least until
somebody will figure out better approach.

Pozdrawiam
Tometzky
--
Best of prhn - najzabawniejsze teksty polskiego UseNet-u
http://prhn.dnsalias.org/
Chaos zawsze pokonuje porządek, gdyż jest lepiej zorganizowany.
[ Terry Pratchett ]

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message tomas 2006-07-04 16:24:34 Re: query very slow when enable_seqscan=on
Previous Message Tom Lane 2006-07-04 14:05:26 Re: ALTER TYPE ... USING(NULL) / NOT NULL violation

Browse pgsql-performance by date

  From Date Subject
Next Message Luckys 2006-07-04 16:00:02 how to tune this query.
Previous Message Tom Lane 2006-07-04 13:56:29 Re: query very slow when enable_seqscan=on