Re: [PERFORM] Hash Anti Join performance degradation

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
Cc: panam <panam(at)gmx(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PERFORM] Hash Anti Join performance degradation
Date: 2011-06-01 02:58:16
Message-ID: BANLkTinbb35QT_57EozGOnszBWuarBgW1g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Tue, May 31, 2011 at 9:11 PM, Cédric Villemain
<cedric(dot)villemain(dot)debian(at)gmail(dot)com> wrote:
> Oh damned, I am currently with an eeepc, I'll need 2 days to bench that :-D
> I'll see tomorow.

LOL.

With respect to the root of the issue (why does the anti-join take so
long?), my first thought was that perhaps the OP was very unlucky and
had a lot of values that hashed to the same bucket. But that doesn't
appear to be the case. There are 120 different box_id values in the
message table, and running hashint8(box_id) % 16384 (which I think is
the right calculation: 4096 buckets * 4 batches) yields 120 different
values.

I think that part of the problem here is that the planner has no
particularly efficient way of executing a non-equijoin. Each probe
finds the appropriate hash bucket and must then probe the entire
chain, all of which pass the hash condition and some of which fail the
join qual. So if there are n_1 instances of value v_1, n_2 instances
of value v_2, etc. then the total effort is proportional to n_1^2 +
n_2^2 + ...

But that doesn't really explain the problem because removing the last
few rows only changes that sum by a small amount.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2011-06-01 03:19:16 Re: [COMMITTERS] pgsql: Improve corner cases in pg_ctl's new wait-for-postmaster-startup
Previous Message Alvaro Herrera 2011-06-01 02:55:03 Re: creating CHECK constraints as NOT VALID

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2011-06-01 03:47:42 Re: [PERFORM] Hash Anti Join performance degradation
Previous Message Jarrod Chesney 2011-06-01 01:45:04 Re: Delete performance