Re: Hash Anti Join performance degradation

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>, "panam" <panam(at)gmx(dot)net>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Hash Anti Join performance degradation
Date: 2011-05-26 14:48:46
Message-ID: 4DDE21FE020000250003DD49@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

Cédric Villemain<cedric(dot)villemain(dot)debian(at)gmail(dot)com> wrote:
> 2011/5/26 panam <panam(at)gmx(dot)net>:

>> "max_connections";"100"

>> "work_mem";"1GB"

Each connection can allocate work_mem, potentially several times.
On a machines without hundreds of GB of RAM, that pair of settings
could cause severe swapping.

>> "Patholgical" query:
>>
>> select
>> b.id,
>> (SELECT
>> m1.id
>> FROM
>> message m1
>> LEFT JOIN
>> message m2
>> ON (
>> m1.box_id = m2.box_id
>> AND m1.id < m2.id
>> )
>> WHERE
>> m2.id IS NULL
>> AND m1.box_id = b.id)
>> from
>> box b

> without explaining further why the antijoin has bad performance
> without cluster, I wonder why you don't use this query :
>
> SELECT b.id,
> max(m.id)
> FROM box b, message m
> WHERE m.box_id = b.id
> GROUP BY b.id;
>
> looks similar and fastest.

I think you would need a left join to actually get identical
results:

SELECT b.id, max(m.id)
FROM box b
LEFT JOIN message m ON m.box_id = b.id
GROUP BY b.id;

But yeah, I would expect this approach to be much faster. Rather
easier to understand and harder to get wrong, too.

-Kevin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message MauMau 2011-05-26 15:24:59 patch for distinguishing PG instances in event log
Previous Message Cédric Villemain 2011-05-26 14:21:21 Re: Hash Anti Join performance degradation

Browse pgsql-performance by date

  From Date Subject
Next Message Kevin Grittner 2011-05-26 15:10:13 Re: The shared buffers challenge
Previous Message Merlin Moncure 2011-05-26 14:31:59 The shared buffers challenge