Postgres_fdw join pushdown - wrong results with whole-row reference

From: Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Postgres_fdw join pushdown - wrong results with whole-row reference
Date: 2016-06-21 07:27:19
Message-ID: CAGPqQf3sS_SuaDR1FuMhDCt1v8COaKvt1jLk2hwRBKrvjqXQ-Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Below query returns the wrong result when join getting pushdown to the
remote
server.

(PFA fdw_setup.sql, to create objects for the test)

postgres=# select e, e.empno, d.deptno, d.dname from f_emp e left join
f_dept d on e.deptno = d.deptno and e.sal > 3000 order by 2, 3 limit 10;
e | empno | deptno |
dname
-----------------------------------------------------------+-------+--------+------------
| 7369 |
|
(7499,ALLEN,SALESMAN,7698,1981-02-20,1600.00,300.00,30) | 7499 |
|
(7521,WARD,SALESMAN,7698,1981-02-22,1250.00,500.00,30) | 7521 |
|
| 7566 |
|
(7654,MARTIN,SALESMAN,7698,1981-09-28,1250.00,1400.00,30) | 7654 |
|
| 7698 |
|
| 7782 |
|
| 7788 |
|
| 7839 | 10
| ACCOUNTING
(7844,TURNER,SALESMAN,7698,1981-09-08,1500.00,0.00,30) | 7844 |
|
(10 rows)

Here, wholerow is coming as NULL even though with non-null empno. If we
remove
limit clause from the query - that will not push the query to the remote
side
and in such case getting correct output.

postgres=# select e, e.empno, d.deptno, d.dname from f_emp e left join
f_dept d on e.deptno = d.deptno and e.sal > 3000 order by 2, 3;
e | empno | deptno
| dname
-----------------------------------------------------------+-------+--------+------------
(7369,SMITH,CLERK,7902,1980-12-17,800.00,,20) | 7369 |
|
(7499,ALLEN,SALESMAN,7698,1981-02-20,1600.00,300.00,30) | 7499 |
|
(7521,WARD,SALESMAN,7698,1981-02-22,1250.00,500.00,30) | 7521 |
|
(7566,JONES,MANAGER,7839,1981-04-02,2975.00,,20) | 7566 |
|
(7654,MARTIN,SALESMAN,7698,1981-09-28,1250.00,1400.00,30) | 7654 |
|
(7698,BLAKE,MANAGER,7839,1981-05-01,2850.00,,30) | 7698 |
|
(7782,CLARK,MANAGER,7839,1981-06-09,2450.00,,10) | 7782 |
|
(7788,SCOTT,ANALYST,7566,1987-04-19,3000.00,,20) | 7788 |
|
(7839,KING,PRESIDENT,,1981-11-17,5000.00,,10) | 7839 | 10
| ACCOUNTING
(7844,TURNER,SALESMAN,7698,1981-09-08,1500.00,0.00,30) | 7844 |
|
(7876,ADAMS,CLERK,7788,1987-05-23,1100.00,,20) | 7876 |
|
(7900,JAMES,CLERK,7698,1981-12-03,950.00,,30) | 7900 |
|
(7902,FORD,ANALYST,7566,1981-12-03,3000.00,,20) | 7902 |
|
(7934,MILLER,CLERK,7782,1982-01-23,1300.00,,10) | 7934 |
|
(14 rows)

Explain verbose output for the query with LIMIT clause is:

postgres=# explain verbose select e, e.empno, d.deptno, d.dname from f_emp
e left join f_dept d on e.deptno = d.deptno and e.sal > 3000 order by 2, 3
limit 10;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------
Limit (cost=100.00..136.86 rows=10 width=236)
Output: e.*, e.empno, d.deptno, d.dname
-> Foreign Scan (cost=100.00..2304.10 rows=598 width=236)
Output: e.*, e.empno, d.deptno, d.dname
Relations: (public.f_emp e) LEFT JOIN (public.f_dept d)
Remote SQL: SELECT CASE WHEN r1.* IS NOT NULL THEN ROW(r1.empno,
r1.ename, r1.job, r1.mgr, r1.hiredate, r1.sal, r1.comm, r1.deptno) END,
r1.empno, r2
.deptno, r2.dname FROM (public.emp r1 LEFT JOIN public.dept r2 ON (((r1.sal
> 3000::numeric)) AND ((r1.deptno = r2.deptno)))) ORDER BY r1.empno ASC
NULLS LAST
, r2.deptno ASC NULLS LAST
(6 rows)

Further looking I found that here problem is because we converting wholerow
reference with ROW - and binding it with CASE clause.

So, in the above example reference to "r" is converted with
"CASE WHEN r1.* IS NOT NULL THEN ROW(r1.empno, r1.ename, r1.job, r1.mgr,
r1.hiredate, r1.sal, r1.comm, r1.deptno) END"

Here r1.* IS NOT NULL is behaving strange, it return TRUE only when all the
elements in the wholerow is NOT NULL.

Example with normal table (not postgres_fdw involded):

postgres=# select r, r.* is null as isnull, r.* is not null as isnotnull
from emp r;
r | isnull |
isnotnull
-----------------------------------------------------------+--------+-----------
(7369,SMITH,CLERK,7902,1980-12-17,800.00,,20) | f | f
(7499,ALLEN,SALESMAN,7698,1981-02-20,1600.00,300.00,30) | f | t
(7521,WARD,SALESMAN,7698,1981-02-22,1250.00,500.00,30) | f | t
(7566,JONES,MANAGER,7839,1981-04-02,2975.00,,20) | f | f
(7654,MARTIN,SALESMAN,7698,1981-09-28,1250.00,1400.00,30) | f | t
(7698,BLAKE,MANAGER,7839,1981-05-01,2850.00,,30) | f | f
(7782,CLARK,MANAGER,7839,1981-06-09,2450.00,,10) | f | f
(7788,SCOTT,ANALYST,7566,1987-04-19,3000.00,,20) | f | f
(7839,KING,PRESIDENT,,1981-11-17,5000.00,,10) | f | f
(7844,TURNER,SALESMAN,7698,1981-09-08,1500.00,0.00,30) | f | t
(7876,ADAMS,CLERK,7788,1987-05-23,1100.00,,20) | f | f
(7900,JAMES,CLERK,7698,1981-12-03,950.00,,30) | f | f
(7902,FORD,ANALYST,7566,1981-12-03,3000.00,,20) | f | f
(7934,MILLER,CLERK,7782,1982-01-23,1300.00,,10) | f | f
(14 rows)

Now I was under impression the IS NOT NULL should be always in inverse of
IS NULL, but clearly here its not the case with wholerow. But further
looking at
the document its saying different thing for wholerow:

https://www.postgresql.org/docs/9.5/static/functions-comparison.html

Note: If the expression is row-valued, then IS NULL is true when the row
expression
itself is null or when all the row's fields are null, while IS NOT NULL is
true
when the row expression itself is non-null and all the row's fields are
non-null.
Because of this behavior, IS NULL and IS NOT NULL do not always return
inverse
results for row-valued expressions, i.e., a row-valued expression that
contains
both NULL and non-null values will return false for both tests. This
definition
conforms to the SQL standard, and is a change from the inconsistent behavior
exhibited by PostgreSQL versions prior to 8.2.

And as above documentation clearly says that IS NULL and IS NOT NULL do not
always return inverse results for row-valued expressions. So need to change
the
deparse logic into postgres_fdw - how ? May be to use IS NULL rather then IS
NOT NULL?

Input/thought?

Regards
Rushabh Lathia
www.EnterpriseDB.com

Attachment Content-Type Size
fdw_setup.sql text/x-sql 2.5 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2016-06-21 07:49:56 Re: Declarative partitioning
Previous Message Amit Kapila 2016-06-21 05:28:02 Re: Reviewing freeze map code