Re: Add semi-join pushdown to postgres_fdw

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Pavel Luzanov <p(dot)luzanov(at)postgrespro(dot)ru>
Cc: Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, "Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp" <Fujii(dot)Yuki(at)df(dot)mitsubishielectric(dot)co(dot)jp>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Ian Lawrence Barwick <barwick(at)gmail(dot)com>
Subject: Re: Add semi-join pushdown to postgres_fdw
Date: 2024-02-09 20:27:27
Message-ID: CAPpHfdtnBO87ryw3cZ148y3XAjjTYmS+a0Nin9qTY9guRiU_Uw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 9, 2024 at 10:08 PM Pavel Luzanov <p(dot)luzanov(at)postgrespro(dot)ru> wrote:
> While playing with this feature I found the following.
>
> Two foreign tables:
> postgres(at)demo_postgres_fdw(17.0)=# \det aircrafts|seats
> List of foreign tables
> Schema | Table | Server
> --------+-----------+-------------
> public | aircrafts | demo_server
> public | seats | demo_server
> (2 rows)
>
>
> This query uses optimization:
>
> postgres(at)demo_postgres_fdw(17.0)=# EXPLAIN (costs off, verbose) SELECT *
> FROM aircrafts a
> WHERE a.aircraft_code = '320' AND EXISTS (
> SELECT * FROM seats s WHERE s.aircraft_code = a.aircraft_code
> );
> QUERY PLAN >
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------->
> Foreign Scan
> Output: a.aircraft_code, a.model, a.range
> Relations: (public.aircrafts a) SEMI JOIN (public.seats s)
> Remote SQL: SELECT r1.aircraft_code, r1.model, r1.range FROM bookings.aircrafts r1 WHERE ((r1.aircraft_code = '320')) AND EXISTS (SELECT NULL FROM bookings.seats r2 WHERE ((r2.aircraft_code =>
> (4 rows)
>
>
> But optimization not used for NOT EXISTS:
>
> postgres(at)demo_postgres_fdw(17.0)=# EXPLAIN (costs off, verbose) SELECT *
> FROM aircrafts a
> WHERE a.aircraft_code = '320' AND NOT EXISTS (
> SELECT * FROM seats s WHERE s.aircraft_code = a.aircraft_code
> );
> QUERY PLAN
> ----------------------------------------------------------------------------------------------------------------
> Nested Loop Anti Join
> Output: a.aircraft_code, a.model, a.range
> -> Foreign Scan on public.aircrafts a
> Output: a.aircraft_code, a.model, a.range
> Remote SQL: SELECT aircraft_code, model, range FROM bookings.aircrafts WHERE ((aircraft_code = '320'))
> -> Materialize
> Output: s.aircraft_code
> -> Foreign Scan on public.seats s
> Output: s.aircraft_code
> Remote SQL: SELECT aircraft_code FROM bookings.seats WHERE ((aircraft_code = '320'))
> (10 rows)
>
> Also, optimization not used after deleting first condition (a.aircraft_code = '320'):
>
> postgres(at)demo_postgres_fdw(17.0)=# EXPLAIN (costs off, verbose) SELECT *
> FROM aircrafts a
> WHERE EXISTS (
> SELECT * FROM seats s WHERE s.aircraft_code = a.aircraft_code
> );
> QUERY PLAN
> --------------------------------------------------------------------------------
> Hash Join
> Output: a.aircraft_code, a.model, a.range
> Inner Unique: true
> Hash Cond: (a.aircraft_code = s.aircraft_code)
> -> Foreign Scan on public.aircrafts a
> Output: a.aircraft_code, a.model, a.range
> Remote SQL: SELECT aircraft_code, model, range FROM bookings.aircrafts
> -> Hash
> Output: s.aircraft_code
> -> HashAggregate
> Output: s.aircraft_code
> Group Key: s.aircraft_code
> -> Foreign Scan on public.seats s
> Output: s.aircraft_code
> Remote SQL: SELECT aircraft_code FROM bookings.seats
> (15 rows)
>
>
> But the worst thing is that replacing AND with OR causes breaking session and server restart:
>
> postgres(at)demo_postgres_fdw(17.0)=# EXPLAIN (costs off, verbose) SELECT *
> FROM aircrafts a
> WHERE a.aircraft_code = '320' OR EXISTS (
> SELECT * FROM seats s WHERE s.aircraft_code = a.aircraft_code
> );
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
> The connection to the server was lost. Attempting reset: Failed.

Thank you, Pavel. I'm looking into this.

------
Regards,
Alexander Korotkov

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dave Cramer 2024-02-09 20:32:10 Re: [PATCH] Add native windows on arm64 support
Previous Message Nathan Bossart 2024-02-09 20:08:28 Re: glibc qsort() vulnerability