Re: [HACKERS] Removing LEFT JOINs in more cases

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Removing LEFT JOINs in more cases
Date: 2018-03-05 00:48:00
Message-ID: CAKJS1f-8dgZrztH=B9JbdJ-W9bShrv92av+Mcg1kxVWGCjUh5Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 5 March 2018 at 04:19, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> select * from t1, t2 where notice(t2.a) = t1.a;
> select * from t1, t2 where notice(t2.a) = t2.b;
>
> With our current implementation, the first will result in executing
> notice() for every row pair in the cross product, while the second
> will evaluate it only once per row of t2, because the condition is
> pushed down to the scan level. Should we stop doing that?

I think we'd get complaints.

I admit to finding it difficult to know what the rules are here. For
example, qual_is_pushdown_safe() refuses to push volatile quals down
into subqueries. I'm a bit unsure why that's disallowed, but
reordering WHERE clause items containing volatile functions is
allowed.

Volatile functions in a subquery targetlist are not guaranteed to be
evaluated as we may push down a non-volatile qual into the subquery
causing fewer rows to be produced by the subquery resulting in fewer
evaluations of the targetlist expressions.

Here's an example of that happening:

EXPLAIN (COSTS OFF, ANALYZE, TIMING OFF, SUMMARY OFF)
SELECT * FROM (SELECT id,notice(id) FROM t1 GROUP BY id) t1 WHERE t1.id = -10;
QUERY PLAN
----------------------------------------------
Group (actual rows=0 loops=1)
Group Key: t1.id
-> Seq Scan on t1 (actual rows=0 loops=1)
Filter: (id = '-10'::integer)
Rows Removed by Filter: 1
(5 rows)

-- use OFFSET 0 to block the pushdown of the non-volatile qual.
EXPLAIN (COSTS OFF, ANALYZE, TIMING OFF, SUMMARY OFF)
SELECT * FROM (SELECT id,notice(id) FROM t1 GROUP BY id OFFSET 0) t1
WHERE t1.id = -10;
NOTICE: 1
QUERY PLAN
---------------------------------------------------------
Subquery Scan on t1 (actual rows=0 loops=1)
Filter: (t1.id = '-10'::integer)
Rows Removed by Filter: 1
-> HashAggregate (actual rows=1 loops=1)
Group Key: t1_1.id
-> Seq Scan on t1 t1_1 (actual rows=1 loops=1)
(6 rows)

> In short, the number of executions of functions in WHERE or JOIN/ON
> isn't at all implementation-independent. We document this in
> https://www.postgresql.org/docs/devel/static/sql-expressions.html#SYNTAX-EXPRESS-EVAL
> where it says
>
> It is particularly dangerous to rely on side effects or evaluation
> order in WHERE and HAVING clauses, since those clauses are
> extensively reprocessed as part of developing an execution
> plan.
>
> Maybe we ought to be more verbose there, but I don't care to abandon
> the principle that we can reorder WHERE clauses, or skip the evaluation
> of unnecessary clauses, as much as we want.
>
> The case I was complaining about upthread involved volatiles in
> the SELECT target list, which *does* have a well-defined number
> of executions, ie once per row produced by the FROM/WHERE clause.

I'm not so sure that's completely true. If you add an SRF then we
seem to get an extra evaluation per row.

# select generate_Series(1,2),notice(id) from t1;
NOTICE: 1
NOTICE: 1
NOTICE: 1
generate_series | notice
-----------------+--------
1 | 1
2 | 1
(2 rows)

You also complained about LATERAL joins to volatile functions, but
that too seems not guaranteed and seems to depend on the query plan
chosen, as highlighted in:

CREATE OR REPLACE FUNCTION notice(pn INT) RETURNS INT AS $$
BEGIN
RAISE NOTICE '%', pn;
RETURN pn;
END;
$$ VOLATILE LANGUAGE plpgsql;

CREATE TABLE t1 (id int);
CREATE TABLE t2 (a int);

INSERT INTO t1 values(1);
INSERT INTO t2 values(1),(1);

EXPLAIN (COSTS OFF, ANALYZE, TIMING OFF, SUMMARY OFF)
SELECT * FROM t1 LEFT JOIN t2 ON t1.id = t2.a, LATERAL notice(t1.id);
NOTICE: 1
QUERY PLAN
-------------------------------------------------------------------
Merge Left Join (actual rows=2 loops=1)
Merge Cond: (t1.id = t2.a)
-> Sort (actual rows=1 loops=1)
Sort Key: t1.id
Sort Method: quicksort Memory: 25kB
-> Nested Loop (actual rows=1 loops=1)
-> Seq Scan on t1 (actual rows=1 loops=1)
-> Function Scan on notice (actual rows=1 loops=1)
-> Sort (actual rows=2 loops=1)
Sort Key: t2.a
Sort Method: quicksort Memory: 25kB
-> Seq Scan on t2 (actual rows=2 loops=1)
(12 rows)

SET from_collapse_limit = 1;

EXPLAIN (COSTS OFF, ANALYZE, TIMING OFF, SUMMARY OFF)
SELECT * FROM t1 LEFT JOIN t2 ON t1.id = t2.a, LATERAL notice(t1.id);
NOTICE: 1
NOTICE: 1
QUERY PLAN
----------------------------------------------------------
Nested Loop (actual rows=2 loops=1)
-> Merge Left Join (actual rows=2 loops=1)
Merge Cond: (t1.id = t2.a)
-> Sort (actual rows=1 loops=1)
Sort Key: t1.id
Sort Method: quicksort Memory: 25kB
-> Seq Scan on t1 (actual rows=1 loops=1)
-> Sort (actual rows=2 loops=1)
Sort Key: t2.a
Sort Method: quicksort Memory: 25kB
-> Seq Scan on t2 (actual rows=2 loops=1)
-> Function Scan on notice (actual rows=1 loops=2)
(12 rows)

RESET from_collapse_limit;

In short, I'm just a little confused at our guarantees. The best I
can see is that we guarantee a volatile function will be evaluated if
it's in the outer query's target list, providing the targetlist does
not have any set-returning functions present, which seems a long way
short of what the documents warn users about.

For now, if we ignore patch 0001, as neither of us is particularly
concerned about that. Patch 0002 might need to have the volatility
checks removed from the GROUP BY clause, but I'm a bit uncertain about
that given the examples I showed above.

Thanks for your feedback so far.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2018-03-05 01:21:00 Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
Previous Message David Gould 2018-03-04 23:29:07 Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.