Re: Trying to track down weird query stalls

From: dan(at)sidhe(dot)org
To: "Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc: dan(at)sidhe(dot)org,pgsql-performance(at)postgresql(dot)org
Subject: Re: Trying to track down weird query stalls
Date: 2009-03-30 18:42:13
Message-ID: 56921.199.172.169.36.1238438533.squirrel@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

> On Mon, Mar 30, 2009 at 1:50 PM, <dan(at)sidhe(dot)org> wrote:
>> I'm running a 64-bit build of Postgres 8.3.5 on AIX 5.3, and have a
>> really
>> strange, annoying transient problem with one particular query stalling.
>>
>> The symptom here is that when this query is made with X or more records
>> in
>> a temp table involved in the join (where X is constant when the problem
>> manifests, but is different between manifestations) the query takes
>> about
>> 20 minutes. When the query is made with X-1 records it takes less than a
>> second. It's just this one query -- for everything else the system's
>> nice
>> and snappy. The machine load's never above 9 (it's a 32 CPU box) and
>> hasn't had less than 60G of free system memory on it.
>>
>> An EXPLAIN ANALYZE of the two queries (with X-1 and X records) is even
>> more bizarre. Not only are the two query plans identical (save trivial
>> differences because of the record count differences) but the explain
>> EXPLAIN ANALYZE total runtimes are near-identical -- the fast case
>> showed
>> 259ms, the slow case 265ms.
>
> log_min_duration_statement is a good place to start, but it sounds
> like the query plan you're getting when you test it by hand isn't the
> same one that's actually executing, so I'm not sure how far that's
> going to get you. contrib/auto_explain sounds like it would be just
> the thing, but unfortunately that's an 8.4 feature which hasn't been
> released yet. I'm not sure whether it could be built and run against
> 8.3.

I'm not executing any of the EXPLAINs by hand, because I didn't want to
have to worry about typos or filling in temp tables with test data. Inside
the app the SQL for the problematic query's stored in a variable -- when
the task runs with debugging enabled it first executes the query with
EXPLAIN ANALYZE prepended and dumps the output, then it executes the query
itself. It's possible something's going wrong in that, but the code's
pretty simple.

Arguably in this case the actual query should run faster than the EXPLAIN
ANALYZE version, since the cache is hot. (Though that'd only likely shave
a few dozen ms off the runtime)

-Dan

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Robert Haas 2009-03-30 19:20:15 Re: Trying to track down weird query stalls
Previous Message Robert Haas 2009-03-30 18:31:39 Re: Trying to track down weird query stalls