Skip site navigation (1) Skip section navigation (2)

Re: Hash Anti Join performance degradation

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>, "panam" <panam(at)gmx(dot)net>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Hash Anti Join performance degradation
Date: 2011-05-26 14:48:46
Message-ID: 4DDE21FE020000250003DD49@gw.wicourts.gov (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-performance
Cédric Villemain<cedric(dot)villemain(dot)debian(at)gmail(dot)com> wrote:
> 2011/5/26 panam <panam(at)gmx(dot)net>:
 
>> "max_connections";"100"
 
>> "work_mem";"1GB"
 
Each connection can allocate work_mem, potentially several times. 
On a machines without hundreds of GB of RAM, that pair of settings
could cause severe swapping.
 
>> "Patholgical" query:
>>
>> select
>>        b.id,
>>        (SELECT
>>                m1.id
>>        FROM
>>                message m1
>>        LEFT JOIN
>>                message m2
>>                        ON (
>>                                m1.box_id = m2.box_id
>>                                AND m1.id < m2.id
>>                        )
>>        WHERE
>>                m2.id IS NULL
>>                AND m1.box_id = b.id)
>> from
>>        box b
 
> without explaining further why the antijoin has bad performance
> without cluster, I wonder why you don't use this query :
> 
> SELECT  b.id,
>                   max(m.id)
> FROM box b, message m
> WHERE m.box_id = b.id
> GROUP BY b.id;
> 
> looks similar and fastest.
 
I think you would need a left join to actually get identical
results:
 
SELECT  b.id, max(m.id)
  FROM box b
  LEFT JOIN message m ON m.box_id = b.id
  GROUP BY b.id;
 
But yeah, I would expect this approach to be much faster.  Rather
easier to understand and harder to get wrong, too.
 
-Kevin

In response to

Responses

pgsql-performance by date

Next:From: Kevin GrittnerDate: 2011-05-26 15:10:13
Subject: Re: The shared buffers challenge
Previous:From: Merlin MoncureDate: 2011-05-26 14:31:59
Subject: The shared buffers challenge

pgsql-hackers by date

Next:From: MauMauDate: 2011-05-26 15:24:59
Subject: patch for distinguishing PG instances in event log
Previous:From: Cédric VillemainDate: 2011-05-26 14:21:21
Subject: Re: Hash Anti Join performance degradation

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group