Re: anti-join chosen even when slower than old plan

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: anti-join chosen even when slower than old plan
Date: 2010-11-10 00:08:00
Message-ID: 20961.1289347680@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
> The semi-join and anti-join have helped us quite a bit, but we have
> seen a situation where anti-join is chosen even though it is slower
> than the "old fashioned" plan. I know there have been other reports
> of this, but I just wanted to go on record with my details.

In principle, the old-style plan ought to be equivalent to a nestloop
antijoin with a seqscan of DbTranLogRecord on the outside and an
indexscan of DbTranRepository on the inside. Can you force it to choose
such a plan by setting enable_mergejoin off (and maybe enable_hashjoin
too)? If so, it'd be interesting to see the estimated costs and actual
runtime on 9.0 for that plan.

It would also be interesting to check estimated and actual costs for the
SELECT COUNT(*) versions of these queries, ie, no actual delete. I'm
suspicious that the cost differential has nothing to do with antijoin
vs. subplan, and everything to do with whether the targeted tuples are
being deleted in physical order (thus improving locality of access for
the deletions). If it's the latter, see previous discussions about
possibly sorting update/delete targets by CTID before applying the
actions.

regards, tom lane

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message bricklen 2010-11-10 00:10:33 Re: Huge overestimation in rows expected results in bad plan
Previous Message Tom Lane 2010-11-09 23:55:12 Re: Huge overestimation in rows expected results in bad plan