Quick Links

Re: Trying to track down weird query stalls

From:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To:	dan(at)sidhe(dot)org
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: Trying to track down weird query stalls
Date:	2009-03-30 19:38:03
Message-ID:	dcc563d10903301238u135d82bpf498c2d40ac31977@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On Mon, Mar 30, 2009 at 12:42 PM, <dan(at)sidhe(dot)org> wrote:
>> On Mon, Mar 30, 2009 at 1:50 PM, <dan(at)sidhe(dot)org> wrote:
>>> I'm running a 64-bit build of Postgres 8.3.5 on AIX 5.3, and have a
>>> really
>>> strange, annoying transient problem with one particular query stalling.
>>>
>>> The symptom here is that when this query is made with X or more records
>>> in
>>> a temp table involved in the join (where X is constant when the problem
>>> manifests, but is different between manifestations) the query takes
>>> about
>>> 20 minutes. When the query is made with X-1 records it takes less than a
>>> second. It's just this one query -- for everything else the system's
>>> nice
>>> and snappy. The machine load's never above 9 (it's a 32 CPU box) and
>>> hasn't had less than 60G of free system memory on it.
>>>
>>> An EXPLAIN ANALYZE of the two queries (with X-1 and X records) is even
>>> more bizarre. Not only are the two query plans identical (save trivial
>>> differences because of the record count differences) but the explain
>>> EXPLAIN ANALYZE total runtimes are near-identical -- the fast case
>>> showed
>>> 259ms, the slow case 265ms.
>>
>> log_min_duration_statement is a good place to start, but it sounds
>> like the query plan you're getting when you test it by hand isn't the
>> same one that's actually executing, so I'm not sure how far that's
>> going to get you. contrib/auto_explain sounds like it would be just
>> the thing, but unfortunately that's an 8.4 feature which hasn't been
>> released yet. I'm not sure whether it could be built and run against
>> 8.3.
>
> I'm not executing any of the EXPLAINs by hand, because I didn't want to
> have to worry about typos or filling in temp tables with test data. Inside
> the app the SQL for the problematic query's stored in a variable -- when
> the task runs with debugging enabled it first executes the query with
> EXPLAIN ANALYZE prepended and dumps the output, then it executes the query
> itself. It's possible something's going wrong in that, but the code's
> pretty simple.
>
> Arguably in this case the actual query should run faster than the EXPLAIN
> ANALYZE version, since the cache is hot. (Though that'd only likely shave
> a few dozen ms off the runtime)

Joining a lot of tables together? Could be GEQO kicking in.

In response to

Re: Trying to track down weird query stalls at 2009-03-30 18:42:13 from dan

Responses

Re: Trying to track down weird query stalls at 2009-03-30 19:42:25 from dan

Browse pgsql-performance by date

	From	Date	Subject
Next Message	dan	2009-03-30 19:42:25	Re: Trying to track down weird query stalls
Previous Message	dan	2009-03-30 19:35:10	Re: Trying to track down weird query stalls