Re: Hash Anti Join performance degradation

From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: panam <panam(at)gmx(dot)net>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Hash Anti Join performance degradation
Date: 2011-05-24 05:53:00
Message-ID: 4DDB47BC.8030504@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On 24/05/11 12:14, panam wrote:
> Hi,
>
> In my DB, there is a large table containing messages and one that contains
> message boxes.
> Messages are assigned to boxes via a child parent link m->b.
> In order to obtain the last message for a specific box, I use the following
> SQL:
>
> SELECT m1.id FROM message m1 LEFT JOIN message m2 ON (m1.box_id = m2.box_id
> AND m1.id < m2.id) WHERE m2.id IS NULL AND m1.box_id = id;
>
> This worked quite well for a long time. But now, suddenly the performance of
> this query drastically degraded as new messages were added.
> If these new messages are removed again, everything is back to normal. If
> other messages are removed instead, the problem remains, so it does not seem
> to be a memory issue. I fear I have difficulties to understand what is going
> wrong here.

The usual cause is that the statistics for estimated row counts cross a
threshold that makes the query planner think that a different kind of
plan will be faster.

If the query planner is using bad information about the performance of
the storage, then it will be making bad decisions about which approach
is faster. So the usual thing to do is to adjust seq_page_cost and
random_page_cost to more closely reflect the real performance of your
hardware, and to make sure that effective_cache_size matches the real
amount of memory your computer has free for disk cache use.

Newer versions of PostgreSQL always include query planning and
statistics improvements too.

BTW, it can be really helpful to paste your query plans into
http://explain.depesz.com/ , which will provide an easier to read visual
analysis of the plan. This will only work with query plans that haven't
been butchered by mail client word wrapping, so I can't do it for you,
but if you paste them there and post the links that'd be really handy.

Also have a look at http://wiki.postgresql.org/wiki/SlowQueryQuestions

I found the plans you posted a bit hard to read. Not your fault; it's
stupid mail clients. Maybe depesz.com needs to be taught to de-munge
the damage done to plans by common mail clients.

--
Craig Ringer

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Vaibhav Kaushal 2011-05-24 06:19:27 Re: Foreign memory context read
Previous Message panam 2011-05-24 04:14:24 Hash Anti Join performance degradation

Browse pgsql-performance by date

  From Date Subject
Next Message Vitalii Tymchyshyn 2011-05-24 09:12:34 Re: reducing random_page_cost from 4 to 2 to force index scan
Previous Message Terry Schmitt 2011-05-24 05:24:28 Re: Performance degradation of inserts when database size grows