Quick Links

~* + LIMIT => infinite time?

From:	<typea(at)l-i-e(dot)com>
To:	<pgsql-performance(at)postgresql(dot)org>
Subject:	~* + LIMIT => infinite time?
Date:	2002-12-14 02:16:06
Message-ID:	49719.216.80.95.13.1039832166.squirrel@www.l-i-e.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

I looked for a "known bugs" sort of database to search before bugging you
guys, but failed to find it... But I am at least asking before I submit a
new bug report :-)

In version 7.1.3 on a Linux box:

A particularly long, nasty query works "just fine" (returning seemingly
correct results in about 15 seconds) until I tack on "LIMIT 1"

Adding LIMIT 1, however, seems to make the query take an infinite amount
of time. Well, more than 5 minutes, anyway, and I'm not that patient when
I know it worked okay without it the LIMIT, if you know what I mean.

Here is the query:

SELECT DISTINCT *, 0 + 10 * (lower(title) like '%albert einstein%') ::int
+ 10 * (lower(author_flattened) like '%albert einstein%') ::int + 30 *
(lower(subject_flattened) like '%albert einstein%') ::int + 9 *
(substring(lower(title), 1, 20) like '%albert%') ::int + 25 *
(substring(lower(text), 1, 20) LIKE '%albert%') ::int + (8 * (lower(title)
LIKE '%albert%' AND lower(title) LIKE '%einstein%' AND ((title ~*
'albert.{0,20}einstein') OR (title ~* 'einstein.{0,20}albert'))) ::int) +
(1 * ( (lower(title) LIKE '%albert%') )::int) + (1 * (
(lower(author_flattened) LIKE '%albert%') )::int) + (1 * (
(lower(subject_flattened) LIKE '%albert%') )::int) + 9 *
(substring(lower(title), 1, 20) like '%einstein%') ::int + 25 *
(substring(lower(text), 1, 20) LIKE '%einstein%') ::int + (8 *
(lower(title) LIKE '%einstein%' AND lower(title) LIKE '%albert%' AND
((title ~* 'einstein.{0,20}albert') OR (title ~*
'albert.{0,20}einstein'))) ::int) + (1 * ( (lower(title) LIKE
'%einstein%') )::int) + (1 * ( (lower(author_flattened) LIKE '%einstein%')
)::int) + (1 * ( (lower(subject_flattened) LIKE '%einstein%') )::int) AS
points FROM article WHERE FALSE OR (lower(title) LIKE '%albert%') OR
(lower(author_flattened) LIKE '%albert%') OR (lower(subject_flattened)
LIKE '%albert%') OR (lower(title) LIKE '%einstein%') OR
(lower(author_flattened) LIKE '%einstein%') OR (lower(subject_flattened)
LIKE '%einstein%') ORDER BY points desc, volume, number, article.article
LIMIT 1 , 1;

explain with or without the LIMIT part is about what you'd expect.

Limit (cost=1596.50..1596.50 rows=1 width=216)
-> Unique (cost=1596.45..1596.50 rows=1 width=216)
-> Sort (cost=1596.45..1596.45 rows=1 width=216)
-> Seq Scan on article (cost=0.00..1596.44 rows=1 width=216)

Obviously the "Limit" line is gone from the explain output when there is
no LIMIT, but the other lines are all the same.

Is this a known bug, is there a fix or work-around?
If not, should I report it, or will the first answer be "Upgrade." ?

The table in question has 17,000 reords, and the various fields mentioned
here are all rather short -- Just author names, subject lines, and titles
of text articles. [The articles themselves are super long, but are not
involved in this query.]

I can take out the ~* parts, and life is good again, so almost for sure
that's a critical component in the failure.

ps auxwwww | grep postgrs seems to report an "idle" postgres process for
each failed query -- attempting to ^C the query and/or killing the idle
process (I know, "Don't") is unfruitful.

kill -9 does nuke the idle processes, IIRC, but I'm not 100% sure...

I restarted the server soon after that, since (A) PHP command-line (aka
"CGI") was refusing to start, complaining about "mm" not being loadable,
and there was not much free RAM and the web-server was not particularly
happy about that state of affairs...

The schema is probably not particularly interesting -- Pretty much every
field involved is a 'text' field, but here you go:

Just FYI, the _flattened fields are de-normalizing (or is it
re-normalizing?) some relation tables so that we're not making a zillion
tuples here, and it's just a simple (we though) short and sweet text
search.

PS Thanks for all your help on the full text index! I'm still evaluating
some options, but a home-brew concordance is showing the most promise.
I'll post source/details if it works out.

In response to

Re: Full Text Index disk space requirements at 2002-11-26 21:33:14 from Hannu Krosing

Responses

Re: ~* + LIMIT => infinite time? at 2002-12-14 02:23:32 from Josh Berkus

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Josh Berkus	2002-12-14 02:23:32	Re: ~* + LIMIT => infinite time?
Previous Message	Tom Lane	2002-12-14 00:03:55	Re: Odd Sort/Limit/Max Problem