Re: ~* + LIMIT => infinite time?

From: <typea(at)l-i-e(dot)com>
To: <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: ~* + LIMIT => infinite time?
Date: 2002-12-15 00:41:54
Message-ID: 63165.12.249.229.112.1039912914.squirrel@www.l-i-e.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

> Josh Berkus <josh(at)agliodbs(dot)com> writes:
>> You'll get a snarky response, and then be told to upgrade, if you try
>> to submit a bug in 7.1.3.
>
> 7.1 is a tad long in the tooth, but still I'm curious about this. I
> don't see how <plan A> can possibly take longer than <plan A> + <LIMIT
> node on top>.

Hey Tom. I think we met very briefly at the International PHP Conference
in Frankfurt in 2001... Anyway.

It's actually the other way around. <Plan A> takes like 4 seconds. <Plan
A> + <LIMIT node on top> takes literally FOREVER and leaves a postgres
process hanging 'round that I have to kill -9 to get rid of.

I'd understand the LIMIT clause taking a bit longer, or being faster for
startup (if there were no ORDER BY, which there is) but I never even
considered it would hang the whole thing. Actually, PostgreSQL has been
so reliable over the years, the idea that I'd run across a bug was just
foreign to me... So I've been trying to tune performance on this query
for weeks now, not realizing that the speed wasn't the issue at all. I
could almost rip out the LIMIT completely if the application logic let me,
and if the performance were a bit better.

It occurred to me last night that the actual data *MIGHT* be involved --
It's some OCR text, and there are a few scattered non-ASCII characters
involved... So *MAYBE* the actual text getting scanned could also be
important.

It seems unlikely, since the non-LIMIT query returns all the data just
fine, but just in case...

Here's a schema and a full dump for anybody that wants to dig in:
http://bulletinarchive.org/pg_dump/

I could provide PHP source as well, or the query posted in this thread can
serve as the test case.

At the moment, I've altered the application to not use LIMIT when I have
~* in the query, and it "works" -- only the paging of results is broken,
and the whole page takes twice as long to load as it should in those
cases, since it's doing the same query twice and snarfing all the monster
data and then throwing away the majority of rows in both cases. I need
the first row to get the highest score, and the rows for paging in the
real application...

Anyway, my point is that the queries seem fine without the LIMIT clause,
and "hang" with both "~*" and LIMIT, and I've even gone so far as to
incorporate that into the application logic for now, just to have a page
that loads at all instead of one that hangs.

Meanwhile, I guess I should flail at it and try 7.3 in the hopes the bug
disappeared. I was hoping to know for sure that it was a fixed bug in
that upgrade path.

Boss actually said we should go ahead and upgrade just on principle
anyway. It's nice to have a smart boss. :-)

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Josh Berkus 2002-12-15 20:22:25 Re: ~* + LIMIT => infinite time?
Previous Message Bruce Momjian 2002-12-14 22:58:30 Re: [GENERAL] PerformPortalClose warning in 7.3