Re: Trying to track down weird query stalls

From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: dan(at)sidhe(dot)org
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Trying to track down weird query stalls
Date: 2009-03-30 19:38:03
Message-ID: dcc563d10903301238u135d82bpf498c2d40ac31977@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, Mar 30, 2009 at 12:42 PM, <dan(at)sidhe(dot)org> wrote:
>> On Mon, Mar 30, 2009 at 1:50 PM,  <dan(at)sidhe(dot)org> wrote:
>>> I'm running a 64-bit build of Postgres 8.3.5 on AIX 5.3, and have a
>>> really
>>> strange, annoying transient problem with one particular query stalling.
>>>
>>> The symptom here is that when this query is made with X or more records
>>> in
>>> a temp table involved in the join (where X is constant when the problem
>>> manifests, but is different between manifestations) the query takes
>>> about
>>> 20 minutes. When the query is made with X-1 records it takes less than a
>>> second. It's just this one query -- for everything else the system's
>>> nice
>>> and snappy. The machine load's never above 9 (it's a 32 CPU box) and
>>> hasn't had less than 60G of free system memory on it.
>>>
>>> An EXPLAIN ANALYZE of the two queries (with X-1 and X records) is even
>>> more bizarre. Not only are the two query plans identical (save trivial
>>> differences because of the record count differences) but the explain
>>> EXPLAIN ANALYZE total runtimes are near-identical -- the fast case
>>> showed
>>> 259ms, the slow case 265ms.
>>
>> log_min_duration_statement is a good place to start, but it sounds
>> like the query plan you're getting when you test it by hand isn't the
>> same one that's actually executing, so I'm not sure how far that's
>> going to get you.  contrib/auto_explain sounds like it would be just
>> the thing, but unfortunately that's an 8.4 feature which hasn't been
>> released yet.  I'm not sure whether it could be built and run against
>> 8.3.
>
> I'm not executing any of the EXPLAINs by hand, because I didn't want to
> have to worry about typos or filling in temp tables with test data. Inside
> the app the SQL for the problematic query's stored in a variable -- when
> the task runs with debugging enabled it first executes the query with
> EXPLAIN ANALYZE prepended and dumps the output, then it executes the query
> itself. It's possible something's going wrong in that, but the code's
> pretty simple.
>
> Arguably in this case the actual query should run faster than the EXPLAIN
> ANALYZE version, since the cache is hot. (Though that'd only likely shave
> a few dozen ms off the runtime)

Joining a lot of tables together? Could be GEQO kicking in.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message dan 2009-03-30 19:42:25 Re: Trying to track down weird query stalls
Previous Message dan 2009-03-30 19:35:10 Re: Trying to track down weird query stalls