Re: [PERFORM] Hash Anti Join performance degradation

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>, panam <panam(at)gmx(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PERFORM] Hash Anti Join performance degradation
Date: 2011-06-01 20:35:16
Message-ID: 24461.1306960516@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Wed, Jun 1, 2011 at 4:25 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Because of the way that a bitmap heap scan works, the rows are
>> guaranteed to be loaded into the hash table in physical order, which
>> means (in the fast case) that the row with the largest "id" value gets
>> loaded last. And because ExecHashTableInsert pushes each row onto the
>> front of its hash chain, that row ends up at the front of the hash
>> chain. Net result: for all the outer rows that aren't the one with
>> maximum id, we get a joinqual match at the very first entry in the hash
>> chain. Since it's an antijoin, we then reject that outer row and go
>> on to the next. The join thus ends up costing us only O(N) time.

> Ah! Make sense. If I'm reading your explanation right, this means
> that we could have hit a similar pathological case on a nestloop as
> well, just with a data ordering that is the reverse of what we have
> here?

Yeah. It's just chance that this particular data set, with this
particular ordering, happens to work well with a nestloop version
of the query. On average I'd expect nestloop to suck even more,
because of more per-inner-tuple overhead.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-06-01 20:43:29 Re: [PERFORM] Hash Anti Join performance degradation
Previous Message Cédric Villemain 2011-06-01 20:34:38 Re: [PERFORM] Hash Anti Join performance degradation

Browse pgsql-performance by date

  From Date Subject
Next Message Robert Haas 2011-06-01 20:43:29 Re: [PERFORM] Hash Anti Join performance degradation
Previous Message Cédric Villemain 2011-06-01 20:34:38 Re: [PERFORM] Hash Anti Join performance degradation