Skip site navigation (1) Skip section navigation (2)

Re: Trying to track down weird query stalls

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: dan(at)sidhe(dot)org
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Trying to track down weird query stalls
Date: 2009-03-30 19:20:15
Message-ID: 603c8f070903301220i26469be1mcbb6ab758fe78bb8@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-performance
On Mon, Mar 30, 2009 at 2:42 PM,  <dan(at)sidhe(dot)org> wrote:
>> On Mon, Mar 30, 2009 at 1:50 PM,  <dan(at)sidhe(dot)org> wrote:
>>> I'm running a 64-bit build of Postgres 8.3.5 on AIX 5.3, and have a
>>> really
>>> strange, annoying transient problem with one particular query stalling.
>>>
>>> The symptom here is that when this query is made with X or more records
>>> in
>>> a temp table involved in the join (where X is constant when the problem
>>> manifests, but is different between manifestations) the query takes
>>> about
>>> 20 minutes. When the query is made with X-1 records it takes less than a
>>> second. It's just this one query -- for everything else the system's
>>> nice
>>> and snappy. The machine load's never above 9 (it's a 32 CPU box) and
>>> hasn't had less than 60G of free system memory on it.
>>>
>>> An EXPLAIN ANALYZE of the two queries (with X-1 and X records) is even
>>> more bizarre. Not only are the two query plans identical (save trivial
>>> differences because of the record count differences) but the explain
>>> EXPLAIN ANALYZE total runtimes are near-identical -- the fast case
>>> showed
>>> 259ms, the slow case 265ms.
>>
>> log_min_duration_statement is a good place to start, but it sounds
>> like the query plan you're getting when you test it by hand isn't the
>> same one that's actually executing, so I'm not sure how far that's
>> going to get you.  contrib/auto_explain sounds like it would be just
>> the thing, but unfortunately that's an 8.4 feature which hasn't been
>> released yet.  I'm not sure whether it could be built and run against
>> 8.3.
>
> I'm not executing any of the EXPLAINs by hand, because I didn't want to
> have to worry about typos or filling in temp tables with test data. Inside
> the app the SQL for the problematic query's stored in a variable -- when
> the task runs with debugging enabled it first executes the query with
> EXPLAIN ANALYZE prepended and dumps the output, then it executes the query
> itself. It's possible something's going wrong in that, but the code's
> pretty simple.
>
> Arguably in this case the actual query should run faster than the EXPLAIN
> ANALYZE version, since the cache is hot. (Though that'd only likely shave
> a few dozen ms off the runtime)

Well... yeah.  Also EXPLAIN ANALYZE has a non-trivial amount of
overhead, so that is quite bizarre.  I have to suspect there is some
subtle difference between the way the EXPLAIN ANALYZE is done and the
way the actual query is done... like maybe one uses parameter
substitution and the other doesn't or, well, I don't know.  But I
don't see how turning on debugging (which is essentially what EXPLAIN
ANALYZE is) can prevent the query from being slow.

...Robert

In response to

Responses

pgsql-performance by date

Next:From: danDate: 2009-03-30 19:35:10
Subject: Re: Trying to track down weird query stalls
Previous:From: danDate: 2009-03-30 18:42:13
Subject: Re: Trying to track down weird query stalls

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group