~* + LIMIT => infinite time?

From: <typea(at)l-i-e(dot)com>
To: <pgsql-performance(at)postgresql(dot)org>
Subject: ~* + LIMIT => infinite time?
Date: 2002-12-14 02:16:06
Message-ID: 49719.216.80.95.13.1039832166.squirrel@www.l-i-e.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I looked for a "known bugs" sort of database to search before bugging you
guys, but failed to find it... But I am at least asking before I submit a
new bug report :-)

In version 7.1.3 on a Linux box:

A particularly long, nasty query works "just fine" (returning seemingly
correct results in about 15 seconds) until I tack on "LIMIT 1"

Adding LIMIT 1, however, seems to make the query take an infinite amount
of time. Well, more than 5 minutes, anyway, and I'm not that patient when
I know it worked okay without it the LIMIT, if you know what I mean.

Here is the query:

SELECT DISTINCT *, 0 + 10 * (lower(title) like '%albert einstein%') ::int
+ 10 * (lower(author_flattened) like '%albert einstein%') ::int + 30 *
(lower(subject_flattened) like '%albert einstein%') ::int + 9 *
(substring(lower(title), 1, 20) like '%albert%') ::int + 25 *
(substring(lower(text), 1, 20) LIKE '%albert%') ::int + (8 * (lower(title)
LIKE '%albert%' AND lower(title) LIKE '%einstein%' AND ((title ~*
'albert.{0,20}einstein') OR (title ~* 'einstein.{0,20}albert'))) ::int) +
(1 * ( (lower(title) LIKE '%albert%') )::int) + (1 * (
(lower(author_flattened) LIKE '%albert%') )::int) + (1 * (
(lower(subject_flattened) LIKE '%albert%') )::int) + 9 *
(substring(lower(title), 1, 20) like '%einstein%') ::int + 25 *
(substring(lower(text), 1, 20) LIKE '%einstein%') ::int + (8 *
(lower(title) LIKE '%einstein%' AND lower(title) LIKE '%albert%' AND
((title ~* 'einstein.{0,20}albert') OR (title ~*
'albert.{0,20}einstein'))) ::int) + (1 * ( (lower(title) LIKE
'%einstein%') )::int) + (1 * ( (lower(author_flattened) LIKE '%einstein%')
)::int) + (1 * ( (lower(subject_flattened) LIKE '%einstein%') )::int) AS
points FROM article WHERE FALSE OR (lower(title) LIKE '%albert%') OR
(lower(author_flattened) LIKE '%albert%') OR (lower(subject_flattened)
LIKE '%albert%') OR (lower(title) LIKE '%einstein%') OR
(lower(author_flattened) LIKE '%einstein%') OR (lower(subject_flattened)
LIKE '%einstein%') ORDER BY points desc, volume, number, article.article
LIMIT 1 , 1;

explain with or without the LIMIT part is about what you'd expect.

Limit (cost=1596.50..1596.50 rows=1 width=216)
-> Unique (cost=1596.45..1596.50 rows=1 width=216)
-> Sort (cost=1596.45..1596.45 rows=1 width=216)
-> Seq Scan on article (cost=0.00..1596.44 rows=1 width=216)

Obviously the "Limit" line is gone from the explain output when there is
no LIMIT, but the other lines are all the same.

Is this a known bug, is there a fix or work-around?
If not, should I report it, or will the first answer be "Upgrade." ?

The table in question has 17,000 reords, and the various fields mentioned
here are all rather short -- Just author names, subject lines, and titles
of text articles. [The articles themselves are super long, but are not
involved in this query.]

I can take out the ~* parts, and life is good again, so almost for sure
that's a critical component in the failure.

ps auxwwww | grep postgrs seems to report an "idle" postgres process for
each failed query -- attempting to ^C the query and/or killing the idle
process (I know, "Don't") is unfruitful.

kill -9 does nuke the idle processes, IIRC, but I'm not 100% sure...

I restarted the server soon after that, since (A) PHP command-line (aka
"CGI") was refusing to start, complaining about "mm" not being loadable,
and there was not much free RAM and the web-server was not particularly
happy about that state of affairs...

The schema is probably not particularly interesting -- Pretty much every
field involved is a 'text' field, but here you go:

Table "article"
Attribute | Type | Modifier
-------------------+---------+----------------------------------------------
id | integer | not null default nextval('article_ID'::text)
volume | text |
number | text |
article | text |
date | text |
cover_date | text |
title | text |
author | text |
author_last | text |
author_first | text |
subject | text |
pages | text |
artwork | text |
text | text |
type | integer |
type_hardcoded | text |
type_detailed | integer |
abstract | text |
subject_flattened | text |
author_flattened | text |
Indices: article_id_index,
article_oid_index,
article_type_index

Just FYI, the _flattened fields are de-normalizing (or is it
re-normalizing?) some relation tables so that we're not making a zillion
tuples here, and it's just a simple (we though) short and sweet text
search.

PS Thanks for all your help on the full text index! I'm still evaluating
some options, but a home-brew concordance is showing the most promise.
I'll post source/details if it works out.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Josh Berkus 2002-12-14 02:23:32 Re: ~* + LIMIT => infinite time?
Previous Message Tom Lane 2002-12-14 00:03:55 Re: Odd Sort/Limit/Max Problem