Re: [HACKERS] Removing LEFT JOINs in more cases

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Removing LEFT JOINs in more cases
Date: 2018-03-04 15:19:32
Message-ID: 31978.1520176772@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> writes:
> On 10 January 2018 at 08:44, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> select distinct nextval('foo') from a left join b ...
>> The presence of the DISTINCT again doesn't excuse changing how often
>> nextval() gets called.

> While working on the cases where the join removal should be disallowed
> I discovered that the existing code is not too careful about this
> either:
> [ a volatile function can be removed in a case like this ]
> select t1.a from t1 left join t1 t2 on t1.a = t2.a and notice(t2.a) = t1.a;
> Should this be fixed? or is this case somehow not worth worrying about?

I don't find that example troubling. The execution of functions in WHERE
has never been particularly constrained. Would you insist, say, that
notice() be evaluated for every pair of rows in the cross product of
t1 and t2, even the ones that don't pass the t1.a = t2.a condition?
Or for a more interesting example, consider these two cases:

select * from t1, t2 where notice(t2.a) = t1.a;
select * from t1, t2 where notice(t2.a) = t2.b;

With our current implementation, the first will result in executing
notice() for every row pair in the cross product, while the second
will evaluate it only once per row of t2, because the condition is
pushed down to the scan level. Should we stop doing that?

In short, the number of executions of functions in WHERE or JOIN/ON
isn't at all implementation-independent. We document this in
https://www.postgresql.org/docs/devel/static/sql-expressions.html#SYNTAX-EXPRESS-EVAL
where it says

It is particularly dangerous to rely on side effects or evaluation
order in WHERE and HAVING clauses, since those clauses are
extensively reprocessed as part of developing an execution
plan.

Maybe we ought to be more verbose there, but I don't care to abandon
the principle that we can reorder WHERE clauses, or skip the evaluation
of unnecessary clauses, as much as we want.

The case I was complaining about upthread involved volatiles in
the SELECT target list, which *does* have a well-defined number
of executions, ie once per row produced by the FROM/WHERE clause.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2018-03-04 15:34:54 Fwd: automatic disable unicode line style when terminal is not unicode
Previous Message Tomas Vondra 2018-03-04 15:05:59 Re: select_parallel test failure: gather sometimes losing tuples (maybe during rescans)?