Re: Question about some changes in 11.3

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mat Arye <mat(at)timescale(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Question about some changes in 11.3
Date: 2019-06-12 22:14:54
Message-ID: 8099.1560377694@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Mat Arye <mat(at)timescale(dot)com> writes:
> On Mon, Jun 3, 2019 at 4:07 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Hm. I'd say this was already broken by the invention of
>> apply_scanjoin_target_to_paths; perhaps 11-before-11.3 managed to
>> still work for you, but it's not hard to envision applications of
>> set_rel_pathlist_hook for which it would not have worked. The contract
>> for set_rel_pathlist_hook is supposed to be that it gets to editorialize
>> on all normal (non-Gather) paths created by the core code, and that's
>> no longer the case now that apply_scanjoin_target_to_paths can add more.
>> ...
>> I wonder whether we could fix matters by postponing the
>> set_rel_pathlist_hook call till later in the same cases where
>> we postpone generate_gather_paths, ie, it's the only baserel.

> Is it really guaranteed that `apply_scanjoin_target_to_paths` will not be
> called in other cases?

Well, apply_scanjoin_target_to_paths is called in *all* cases. It only
throws away the original paths for partitioned rels, though.

I spent some more time looking at this, and I am afraid that my idea
of postponing set_rel_pathlist_hook into apply_scanjoin_target_to_paths
isn't going to work: there is not anyplace in that function where we
could call the hook without the API being noticeably different from
what it is at the current call site. In particular, if we try to call
it down near the end so that it still has the property of being able
to remove any core-generated path, then there's a *big* difference for
queries involving SRFs: we've already plastered ProjectSetPath(s) atop
the original paths, and any user of the hook would have to know to do
likewise for freshly-generated paths. That would certainly break
existing hook users.

I'm inclined to think that the safest course is to leave
set_rel_pathlist_hook as it stands, and invent a new hook that is called
in apply_scanjoin_target_to_paths just before the generate_gather_paths
call. (Or, perhaps, just after that --- but the precedent of
set_rel_pathlist_hook suggests that "before" is more useful.)
For your use-case you'd have to get into both hooks, and they'd both have
to know that if they're dealing with a partitioned baserel that is the
only baserel in the query, the new hook is where to generate paths
rather than the old hook. Maybe it'd be worth having the core code
export some simple test function for that, rather than having the details
of those semantics be wired into various extensions.

I think it'd be all right to put a patch done that way into the v11
branch. It would not make anything any worse for code that uses
set_rel_pathlist_hook and is OK with the v11 behavior. Code that
needs to use the new hook would fail to load into 11-before-11.whatever,
but that's probably better than loading and then doing the wrong thing.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2019-06-12 22:36:44 Re: Should we warn against using too many partitions?
Previous Message Peter Geoghegan 2019-06-12 22:06:34 Re: PG 12 draft release notes