Re: Ideas about a better API for postgres_fdw remote estimates

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Subject: Re: Ideas about a better API for postgres_fdw remote estimates
Date: 2020-07-06 14:11:14
Message-ID: 20200706141114.GF3125@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> Stephen Frost <sfrost(at)snowman(dot)net> writes:
> > * Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> >> 2. Wedging this into EXPLAIN would be quite ugly, because (at least
> >> as I envision it) the output would have just about nothing to do with
> >> any existing EXPLAIN output.
>
> > This is a better argument for not making it part of EXPLAIN, though I
> > don't really feel like I've got a decent idea of what you are suggesting
> > the output *would* look like, so it's difficult for me to agree (or
> > disagree) about this particular point.
>
> Per postgres_fdw's get_remote_estimate(), the only data we use right now
> is the startup_cost, total_cost, rows and width estimates from the
> top-level Plan node. That's available immediately from the Plan tree,
> meaning that basically *nothing* of the substantial display effort
> expended by explain.c and ruleutils.c is of any value. So the level-zero

The 'display effort' you're referring to, when using JSON format with
explain, is basically to format the results into JSON and return them-
which is what you're suggesting this mode would do anyway, no..?

If the remote side 'table' is actually a view that's complicated then
having a way to get just the top-level information (and excluding the
rest) sounds like it'd be useful and perhaps excluding that other info
doesn't really fit into EXPLAIN's mandate, but that's also much less
common.

> implementation of this would be to run the parser and planner, format
> those four numbers into a JSON object (which would require little more
> infrastructure than sprintf), and return that. Sure, we could make that
> into some kind of early-exit path in explain.c, but I think it'd be a
> pretty substantial wart, especially since it'd mean that none of the
> other EXPLAIN options are sensible in combination with this one.

That EXPLAIN has options that only make sense in combination with
certain other options isn't anything new- BUFFERS makes no sense without
ANALYZE, etc.

> Further down the road, we might want to rethink the whole idea of
> completely constructing a concrete Plan. We could get the data we need
> at the list-of-Paths stage. Even more interesting, we could (with very
> little more work) return data about multiple Paths, so that the client
> could find out, for example, the costs of sorted and unsorted output
> without paying two network round trips to discover that. That'd
> definitely require changes in the core planner, since it has no API to
> stop at that point. And it's even less within the charter of EXPLAIN.

I have to admit that I'm not really sure how we could make it work, but
having a way to get multiple paths returned by EXPLAIN would certainly
be interesting to a lot of users. Certainly it's easier to see how we
could get at that info in a postgres_fdw-specific function, and be able
to understand how to deal with it there and what could be done, but once
it's there I wonder if other tools might see that and possibly even
build on it because it'd be the only way to get that kind of info, which
certainly wouldn't be ideal.

> I grant your point that there might be other users for this besides
> postgres_fdw, but that doesn't mean it must be a core feature.

That postgres_fdw is an extension is almost as much of a wart as
anything being discussed here and suggesting that things added to
postgres_fdw aren't 'core features' seems akin to ignoring the forest
for the trees- consider that, today, there isn't even an option to
install only the core server from the PGDG repos (at least for Debian /
Ubuntu, not sure if the RPMs have caught up to that yet, but they
probably should). The 'postgresql-12' .deb includes all the extensions
that are part of the core git repo, because they're released and
maintained just the same as the core server and, from a practical
perspective, to run a decent PG system you really should have them
installed, so why bother having a separate package?

> >> 3. We surely would not back-patch a core change like this. OTOH, if
> >> the added infrastructure is in an extension, somebody might want to
> >> back-patch that (even if unofficially).
>
> > Since postgres_fdw is part of core and core's release cycle, and the
> > packagers manage the extensions from core in a way that they have to
> > match up, this argument doesn't hold any weight with me.
>
> Certainly only v14 (or whenever) and later postgres_fdw would be able
> to *use* this data. The scenario I'm imagining is that somebody wants
> to be able to use that client against an older remote server, and is
> willing to install some simple extension on the remote server to do so.
> Perhaps this scenario is not worth troubling over, but I don't think
> it's entirely far-fetched.

I definitely don't think that such an extension should be maintained
outside of core, and I seriously doubt any of our packagers would be
anxious to build an indepedent package for this to be usable in older
servers. Sure, it's possible someone will care about this enough to
spend the effort to try and build it for an older version and use it but
I definitely don't think we should be considering that a serious design
goal or a reason to put this capability in a separate extension.

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2020-07-06 14:56:53 Re: Cache lookup errors with functions manipulation object addresses
Previous Message Ashutosh Bapat 2020-07-06 13:37:18 Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit