Re: NOT IN subquery optimization

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Li, Zheng" <zhelli(at)amazon(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: NOT IN subquery optimization
Date: 2020-03-24 20:29:11
Message-ID: 6635.1585081751@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Li, Zheng" <zhelli(at)amazon(dot)com> writes:
> Here is the latest rebased patch.

I noticed that the cfbot is failing to test this because of some trivial
merge conflicts, so here's a re-rebased version.

I haven't reviewed this in any detail, but here's a couple of notes
from having quickly looked through the patch:

* I find it entirely unacceptable to stick some planner temporary
fields into struct SubLink. If you need that storage you'll have
to find some other place to put it. But in point of fact I don't
think you need it; it doesn't look to me to be critical to generate
the subquery's plan any earlier than make_subplan would have done it.
Moreover, you should really strive to *not* do that, because it's
likely to get in the way of other future optimizations. As the
existing comment in make_subplan already suggests, we might want to
delay subplan planning even further than that in future.

* I'm also not too happy with the (undocumented) rearrangement of
reduce_outer_joins. There's a specific sequence of processing that
that's involved in, as documented at the top of prepjointree.c, and
I doubt that you can just randomly call it from other places and expect
good results. In particular, since JOIN alias var flattening won't have
happened yet when this code is being run from pull_up_sublinks, it's
unlikely that reduce_outer_joins will reliably get the same answers it
would get normally. I also wonder whether it's safe to make the
parsetree changes it makes earlier than normal, and whether it will be
problematic to run it twice on the same tree, and whether its rather
indirect connection to distribute_qual_to_rels is going to misbehave.

* The proposed test additions seem to about triple the runtime of
subselect.sql. This seems excessive. I also wonder why it's necessary
for this test to build its own large tables; couldn't it re-use ones
that already exist in the regression database?

* Not really sure that we need a new planner GUC for this, but if we
do, it needs to be documented.

regards, tom lane

Attachment Content-Type Size
not_in_to_anti_join_transformation_v2_4.patch text/x-diff 168.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-03-24 20:29:51 Re: Missing errcode() in ereport
Previous Message Alvaro Herrera 2020-03-24 20:21:06 Re: Option to dump foreign data in pg_dump