postgres_fdw: wrong results with self join + enable_nestloop off

From: Nishant Sharma <nishant(dot)sharma(at)enterprisedb(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: postgres_fdw: wrong results with self join + enable_nestloop off
Date: 2023-04-14 11:38:39
Message-ID: CADrsxdbcN1vejBaf8a+QhrZY5PXL-04mCd4GDu6qm6FigDZd6Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

We have observed that running the same self JOIN query on postgres FDW
setup is returning different results with set enable_nestloop off & on. I
am at today's latest commit:- 928e05ddfd4031c67e101c5e74dbb5c8ec4f9e23

I created a local FDW setup. And ran this experiment on the same. Kindly
refer to the P.S section for details.

|********************************************************************|
*Below is the output difference along with query plan:-*
postgres(at)71609=#set enable_nestloop=off;
SET
postgres(at)71609=#select * from pg_tbl_foreign tbl1 join pg_tbl_foreign tbl2
on tbl1.id1 < 5 and now() < '23-Feb-2020'::timestamp;
id1 | id2 | id1 | id2
-----+-----+-----+-----
1 | 10 | 1 | 10
2 | 20 | 1 | 10
3 | 30 | 1 | 10
1 | 10 | 2 | 20
2 | 20 | 2 | 20
3 | 30 | 2 | 20
1 | 10 | 3 | 30
2 | 20 | 3 | 30
3 | 30 | 3 | 30
(9 rows)

postgres(at)71609=#explain (analyze, verbose) select * from pg_tbl_foreign
tbl1 join pg_tbl_foreign tbl2 on tbl1.id1 < 5 and now() <
'23-Feb-2020'::timestamp;
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------
Foreign Scan (cost=100.00..49310.40 rows=2183680 width=16) (actual
time=0.514..0.515 rows=9 loops=1)
Output: tbl1.id1, tbl1.id2, tbl2.id1, tbl2.id2
Relations: (public.pg_tbl_foreign tbl1) INNER JOIN
(public.pg_tbl_foreign tbl2)
Remote SQL: SELECT r1.id1, r1.id2, r2.id1, r2.id2 FROM (public.pg_tbl r1
INNER JOIN public.pg_tbl r2 ON (((r1.id1 < 5))))
Planning Time: 0.139 ms
Execution Time: 0.984 ms
(6 rows)

postgres(at)71609=#set enable_nestloop=on;
SET
postgres(at)71609=#select * from pg_tbl_foreign tbl1 join pg_tbl_foreign tbl2
on tbl1.id1 < 5 and now() < '23-Feb-2020'::timestamp;
id1 | id2 | id1 | id2
-----+-----+-----+-----
(0 rows)

postgres(at)71609=#explain (analyze, verbose) select * from pg_tbl_foreign
tbl1 join pg_tbl_foreign tbl2 on tbl1.id1 < 5 and now() <
'23-Feb-2020'::timestamp;
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------
Result (cost=200.00..27644.00 rows=2183680 width=16) (actual
time=0.003..0.004 rows=0 loops=1)
Output: tbl1.id1, tbl1.id2, tbl2.id1, tbl2.id2
One-Time Filter: (now() < '2020-02-23 00:00:00'::timestamp without time
zone)
-> Nested Loop (cost=200.00..27644.00 rows=2183680 width=16) (never
executed)
Output: tbl1.id1, tbl1.id2, tbl2.id1, tbl2.id2
-> Foreign Scan on public.pg_tbl_foreign tbl2
(cost=100.00..186.80 rows=2560 width=8) (never executed)
Output: tbl2.id1, tbl2.id2
Remote SQL: SELECT id1, id2 FROM public.pg_tbl
-> Materialize (cost=100.00..163.32 rows=853 width=8) (never
executed)
Output: tbl1.id1, tbl1.id2
-> Foreign Scan on public.pg_tbl_foreign tbl1
(cost=100.00..159.06 rows=853 width=8) (never executed)
Output: tbl1.id1, tbl1.id2
Remote SQL: SELECT id1, id2 FROM public.pg_tbl WHERE
((id1 < 5))
Planning Time: 0.178 ms
Execution Time: 0.292 ms
(15 rows)

|********************************************************************|

I debugged this issue and was able to find a fix for the same. Kindly
please refer to the attached fix. With the fix I am able to resolve the
issue. But I am not that confident whether this change would affect some
other existing functionally but it has helped me resolve this result
difference in output.

*What is the technical issue?*
The problem here is the use of extract_actual_clauses. Because of which the
plan creation misses adding the second condition of AND i.e "now() <
'23-Feb-2020'::timestamp" in the plan. Because it is not considered a
pseudo constant and extract_actual_clause is passed with false as the
second parameter and it gets skipped from the list. As a result that
condition is never taken into consideration as either one-time filter
(before or after) or part of SQL remote execution

*Why do I think the fix is correct?*
The fix is simple, where we have created a new function similar to
extract_actual_clause which just extracts all the conditions from the list
with no checks and returns the list to the caller. As a result all
conditions would be taken into consideration in the query plan.

*After my fix patch:-*
postgres(at)78754=#set enable_nestloop=off;
SET
postgres(at)78754=#select * from pg_tbl_foreign tbl1 join pg_tbl_foreign tbl2
on tbl1.id1 < 5 and now() < '23-Feb-2020'::timestamp;
id1 | id2 | id1 | id2
-----+-----+-----+-----
(0 rows)
^
postgres(at)78754=#explain (analyze, verbose) select * from pg_tbl_foreign
tbl1 join pg_tbl_foreign tbl2 on tbl1.id1 < 5 and now() <
'23-Feb-2020'::timestamp;
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------
Foreign Scan (cost=100.00..49310.40 rows=2183680 width=16) (actual
time=0.652..0.652 rows=0 loops=1)
Output: tbl1.id1, tbl1.id2, tbl2.id1, tbl2.id2
Filter: (now() < '2020-02-23 00:00:00'::timestamp without time zone)
Rows Removed by Filter: 9
Relations: (public.pg_tbl_foreign tbl1) INNER JOIN
(public.pg_tbl_foreign tbl2)
Remote SQL: SELECT r1.id1, r1.id2, r2.id1, r2.id2 FROM (public.pg_tbl r1
INNER JOIN public.pg_tbl r2 ON (((r1.id1 < 5))))
Planning Time: 0.133 ms
Execution Time: 1.127 ms
(8 rows)

postgres(at)78754=#set enable_nestloop=on;
SET
postgres(at)78754=#select * from pg_tbl_foreign tbl1 join pg_tbl_foreign tbl2
on tbl1.id1 < 5 and now() < '23-Feb-2020'::timestamp;
id1 | id2 | id1 | id2
-----+-----+-----+-----
(0 rows)

postgres(at)78754=#explain (analyze, verbose) select * from pg_tbl_foreign
tbl1 join pg_tbl_foreign tbl2 on tbl1.id1 < 5 and now() <
'23-Feb-2020'::timestamp;
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------
Result (cost=200.00..27644.00 rows=2183680 width=16) (actual
time=0.004..0.005 rows=0 loops=1)
Output: tbl1.id1, tbl1.id2, tbl2.id1, tbl2.id2
One-Time Filter: (now() < '2020-02-23 00:00:00'::timestamp without time
zone)
-> Nested Loop (cost=200.00..27644.00 rows=2183680 width=16) (never
executed)
Output: tbl1.id1, tbl1.id2, tbl2.id1, tbl2.id2
-> Foreign Scan on public.pg_tbl_foreign tbl2
(cost=100.00..186.80 rows=2560 width=8) (never executed)
Output: tbl2.id1, tbl2.id2
Remote SQL: SELECT id1, id2 FROM public.pg_tbl
-> Materialize (cost=100.00..163.32 rows=853 width=8) (never
executed)
Output: tbl1.id1, tbl1.id2
-> Foreign Scan on public.pg_tbl_foreign tbl1
(cost=100.00..159.06 rows=853 width=8) (never executed)
Output: tbl1.id1, tbl1.id2
Remote SQL: SELECT id1, id2 FROM public.pg_tbl WHERE
((id1 < 5))
Planning Time: 0.134 ms
Execution Time: 0.347 ms
(15 rows)
|********************************************************************|

Kindly please comment if I am in the correct direction or not?

Regards,
Nishant Sharma.
Developer at EnterpriseDB, Pune, India.

P.S
Steps that I used to create local postgres FDW setup ( followed link -
https://www.postgresql.org/docs/current/postgres-fdw.html
<https://www.postgresql.org/docs/current/postgres-fdw.html):-> )

1) ./configure --prefix=/home/edb/POSTGRES_INSTALL/MASTER
--with-pgport=9996 --with-openssl --with-libxml --with-zlib --with-tcl
--with-perl --with-libxslt --with-ossp-uuid --with-ldap --with-pam
--enable-nls --enable-debug --enable-depend --enable-dtrace --with-selinux
--with-icu --enable-tap-tests --enable-cassert CFLAGS="-g -O0"

2) make

3) make install

4) cd contrib/postgres_fdw/

5) make install

6) Start the server

7)
[edb(at)localhost MASTER]$ bin/psql postgres edb;
psql (16devel)
Type "help" for help.

postgres(at)70613=#create database remote_db;
CREATE DATABASE
postgres(at)70613=#quit

[edb(at)localhost MASTER]$ bin/psql remote_db edb;
psql (16devel)
Type "help" for help.

remote_db(at)70613=#CREATE USER fdw_user;
CREATE ROLE

remote_db(at)70613=#GRANT ALL ON SCHEMA public TO fdw_user;
GRANT
remote_db(at)70613=#quit

[edb(at)localhost MASTER]$ bin/psql remote_db fdw_user;
psql (16devel)
Type "help" for help.

remote_db(at)70613=#create table pg_tbl(id1 int, id2 int);
CREATE TABLE
remote_db(at)70613=#insert into pg_tbl values(1, 10);
INSERT 0 1
remote_db(at)70613=#insert into pg_tbl values(2, 20);
INSERT 0 1
remote_db(at)70613=#insert into pg_tbl values(3, 30);
INSERT 0 1

8)
New terminal/Tab:-
[edb(at)localhost MASTER]$ bin/psql postgres edb;
postgres(at)71609=#create extension postgres_fdw;
CREATE EXTENSION
postgres(at)71609=#CREATE SERVER localhost_fdw FOREIGN DATA WRAPPER
postgres_fdw OPTIONS (dbname 'remote_db', host 'localhost', port '9996');
CREATE SERVER
postgres(at)71609=#CREATE USER MAPPING for edb SERVER localhost_fdw OPTIONS
(user 'fdw_user', password '');
CREATE USER MAPPING
postgres(at)71609=#GRANT ALL ON FOREIGN SERVER localhost_fdw TO edb;
GRANT
postgres(at)71609=#CREATE FOREIGN TABLE pg_tbl_foreign(id1 int, id2 int)
SERVER localhost_fdw OPTIONS (schema_name 'public', table_name 'pg_tbl');
CREATE FOREIGN TABLE
postgres(at)71609=#select * from pg_tbl_foreign;
id1 | id2
-----+-----
1 | 10
2 | 20
3 | 30
(3 rows)

Attachment Content-Type Size
PG_PATCH_set_nestloop_off_issue_fix.patch application/octet-stream 2.0 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jehan-Guillaume de Rorthais 2023-04-14 11:43:21 Re: OOM in hash join
Previous Message Thomas Munro 2023-04-14 11:27:55 Re: OOM in hash join