Re: Wildly erratic query performance

From: Eric Schwarzenbach <subscriber(at)blackbrook(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Wildly erratic query performance
Date: 2008-10-31 20:36:02
Message-ID: 490B6C32.3020208@blackbrook.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Dann,

Thanks for your response. I thought I'd covered most of what your are
asking in my first message, but these results are weird enough that I
can understand you might not give me the benefit of the doubt and
without very explicit confirmation. To answer your questions:

YES the query each time is IDENTICAL. I am not changing a single
character, I am simply clicking the execute button in pgadmin for each run.

This is my personal laptop, so of course there are other processes
running, like say, a web browser, an email client, etc. And of course,
as on any such machine there may be minor processes that execute in the
background without my awareness. But I am not actively doing anything
else (like running a large compile, yeesh!) while the query is
executing. This is why I noted that I was monitoring the CPU usage (and
processes), so that I can be confident that something major is not
suddenly running in the background without my initiating it directly.
And to make the difference between 2 seconds and a minute, let alone 10
minutes, would take a pretty major and hard not to notice process.

As I explained already (no pun intended) running the query using EXPLAIN
makes the wild variation go away. So I cannot get explain results for a
fast and for a slow execution.

I did not include schema information and such because I am not clear I
am allowed to make them public, and because I'm not looking for a highly
specific answer, merely are there ANY conditions where the SAME EXACT
QUERY should perform so radically differently. If the query planner, for
example, used a random number generator to choose the order in which it
performed my joins, such that the join order would be different each
time, this would explain it--that possibility would seem bizarre to me,
but it would certainly answer my question.

Eric

Dann Corbit wrote:
>> -----Original Message-----
>> From: pgsql-general-owner(at)postgresql(dot)org [mailto:pgsql-general-
>> owner(at)postgresql(dot)org] On Behalf Of Eric Schwarzenbach
>> Sent: Friday, October 31, 2008 12:35 PM
>> To: pgsql-general(at)postgresql(dot)org
>> Subject: [GENERAL] Wildly erratic query performance
>>
>> I've got a particular query that is giving me ridiculously erratic
>> query
>> performance. I have the SQL in a pgadmin query window, and from one
>> execution to another, with no changes, the time it takes varies from
>> half a second to, well, at least 10 minutes or so at which point I
>>
> give
>
>> up an cancel the query. A typical time is 2-3 seconds, but it's all
>> over
>> the map. I've seen numbers like 112 seconds for one which returns
>> without exceeding my patience. In every half a dozen or so execution
>> there will be one time which is an order of magnitude bigger than the
>> others. A typical series of executions might be something like 2
>> seconds, 5 seconds, 8 seconds, 0.5 seconds, 80 seconds, 5 seconds.
>>
>> Note that the database is running on my local machine, the same
>>
> machine
>
>> I'm running the queries from, and nothing else is using this
>>
> postgresql
>
>> installation. The data in the database is also not changing--there are
>> no inserts or updates happening between queries. I ran a vaccuum
>>
> (full,
>
>> analyze) just before I trying these queries. I do monitor my CPU usage
>> and there is definitely not some other process on my machine sucking
>>
> up
>
>> all the cpu cycles now and then to explain this.
>>
>> This is postgreslq 8.3, on Windows XP. The query joins about 17 tables
>> (without an explicit JOIN, just using the WHERE criteria) with a few
>> further predicates. One thing which distinguishes it from other
>>
> similar
>
>> queries I've been doing where I haven't seen this odd erraticness is
>> there are 2 predicates ORred together (and then ANDed with all the
>> other
>> conditions which are all ANDed) which effectively divides 2 subsets of
>> joined tables which are not joined to each other, but both joined to
>> another set of tables. (I don't know if that was a comprehensible way
>> of
>> explaining this...but I don't know if it's relevant enough to be worth
>> explaining in more detail).
>>
>> I've tried running explain, however the wild erraticness seems to go
>> away when I use explain, taking in the ballpark of 1.5 seconds every
>> time. This is faster than my average query time using a plain execute,
>> even if I don't discount all the unusually long times.
>>
>> Is there any reasonable explanation for this phenomena?
>>
>> I do realize I could help the query planner with explicit JOINs,
>> however
>> I have not yet embarked on this optimization, and might not bother if
>> the query performance is acceptable without doing so. I don't expect
>> the
>> execution plan to be optimal, however I do expect it to be
>> deterministic.
>>
>
> Something is missing from your descriptions.
> An explain analyze on the query and a list of the schema for the
> relevant tables would be helpful.
> Are the queries identical? Just changing the where clause a bit can
> cause big differences in query speed.
>
> Consider:
> SELECT age, status, phone FROM work_force WHERE state IN ('NY', 'CA',
> 'TX');
> May run more slowly than:
> SELECT age, status, phone FROM work_force WHERE state IN ('ID', 'MT',
> 'NV');
> Because the first three states have large populations and the last three
> states have smaller populations.
>
> Does the database machine run solely as a database server or are there
> other things going on? E.g. If you are doing a compile and link of
> 10,000 source files during one query and the machine is otherwise idle
> during a different one, we will expect different results.
>
> There will be (of course) a logical explanation for the query time
> differences.
>
> I suggest the following:
> 1. Do an explain analyze on a query that is slow
> 2. Do an explain analyze on a query that is fast (unless the queries
> are literally identical in every way)
> I guess that (along with the schema) will be enough to get an idea what
> is happening.
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Eric Schwarzenbach 2008-10-31 20:49:08 GEQO randomness?
Previous Message aravind chandu 2008-10-31 20:18:46 Need Help for a query