Re: Using each rel as both outer and inner for JOIN_ANTI

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: emre(at)hasegeli(dot)com
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Using each rel as both outer and inner for JOIN_ANTI
Date: 2021-06-29 08:55:59
Message-ID: CAMbWs4_Jsj6-kM0bW8PFVby_pY-xDiJArsR3rRPCpMDbLJ24pA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 29, 2021 at 3:55 PM Emre Hasegeli <emre(at)hasegeli(dot)com> wrote:

> > Thanks for the explanation. Attached is a demo code for the hash-join
> > case, which is only for PoC to show how we can make it work. It's far
> > from complete, at least we need to adjust the cost calculation for this
> > 'right anti join'.
>
> I applied the patch and executed some queries. Hash Right Anti Joins
> seem to be working correctly. Though, some of the tests are failing.
> I guessed it's because the other join algorithms do not support right
> anti join, but I couldn't reproduce it.
>

Thanks for verifying this patch.

>
> I am impressed by how simple the patch is: only 2 lines to support a
> new join algorithm. This is a good case for the quality of Postgres
> code. I hope supporting the other join algorithms would be similar.
>

Yes, thanks to the excellent design pattern of the execution codes, we
only need very few changes to support this new join type.

>
> I am not sure how the cost estimation should differ from straight anti
> join. It seems to me that the planner is already making the right
> choice by taking into account the cost of the Hash node which makes
> the whole cost greater when the inner table is much bigger.
>

I think we can basically use the same cost calculation as with anti
joins, since they share the fact that the executor will stop after the
first match. However, there are still some differences. Such as when we
consider the number of tuples that will pass the basic join, I think we
need to use unmatched inner rows, rather than unmatched outer rows.

>
> I am not an expert planner, but it feels to me like a good feature
> that can provide better plans in some cases. Given it works correctly
> and the implementation is so simple, the only argument against it may
> be increased planning time. I know that the planner performance is
> affected negatively by the number of join paths to consider. This may
> not be a bigger deal as typically there are not many anti joins in a
> query, but it'd still be a good idea to do some performance tests.
>

Agree. Performance tests are necessary if we consider finishing this
patch.

Thanks
Richard

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2021-06-29 09:47:11 Re: Overflow hazard in pgbench
Previous Message Amit Kapila 2021-06-29 08:51:36 Re: Doc chapter for Hash Indexes