Re: [HACKERS] Removing LEFT JOINs in more cases

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Removing LEFT JOINs in more cases
Date: 2018-01-09 19:44:55
Message-ID: 7760.1515527095@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> writes:
> [ remove_left_join_distinct_2017-11-30.patch ]

I looked at this, and it seems like an interesting idea, but I'm
unconvinced that the specific conditions here are OK.

One clear problem is that you consider a DISTINCT to be an automatic
get-out-of-jail-free card, but that doesn't work in the presence of
aggregates:

select distinct count(*) from a left join b on ...

So at the very least it needs to be more like

if ((root->parse->distinctClause != NIL ||
root->parse->groupClause != NIL) && !root->parse->hasAggs)

I believe it is probably necessary to reject this optimization if window
functions are present, as well, because DISTINCT would not happen till
after the wfunc calculation and wfuncs are definitely sensitive to the
number of rows they see. (But maybe wfuncs + GROUP BY are OK.)

Another case that seems less than safe is if the proposed-to-be-dropped
left join is within the LHS of any LATERAL join. For instance,

select distinct ... from (a left join b ...), lateral nextval('foo');

The row duplication from b would affect how often nextval gets called,
and the presence of the DISTINCT later on doesn't entitle us to change
that. (Maybe it'd be all right to perform the optimization if the
lateral function is known stable/immutable, but I think that's getting
too far afield; I'd be inclined to just drop the idea as soon as we
see a lateral dependency.)

Actually, that same case occurs for volatile functions in the tlist,
ie

select distinct nextval('foo') from a left join b ...

The presence of the DISTINCT again doesn't excuse changing how often
nextval() gets called.

I kinda doubt this list of counterexamples is exhaustive, either;
it's just what occurred to me in five or ten minutes thought.
So maybe you can make this idea work, but you need to think much
harder about what the counterexamples are.

As far as code structure goes, it's surely not going to be sane to
have two copies of the checking logic, much less try to cram one
of them into a single if. I'd suggest factoring it out into a
separate function that can be called from both places.

I'll set the patch back to Waiting on Author.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2018-01-09 19:51:03 Re: Challenges preventing us moving to 64 bit transaction id (XID)?
Previous Message Andrew Dunstan 2018-01-09 19:31:40 Re: to_timestamp TZH and TZM format specifiers