Re: Ideas about a better API for postgres_fdw remote estimates

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Subject: Re: Ideas about a better API for postgres_fdw remote estimates
Date: 2020-07-05 17:06:01
Message-ID: 20200705170601.GD3125@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> In <1116564(dot)1593813043(at)sss(dot)pgh(dot)pa(dot)us> I wrote:
> > I wonder whether someday we ought to invent a new API that's more
> > suited to postgres_fdw's needs than EXPLAIN is. It's not like the
> > remote planner doesn't know the number we want; it just fails to
> > include it in EXPLAIN.
>
> I've been thinking about this a little more, and I'd like to get some
> ideas down on electrons before they vanish.
>
> The current method for postgres_fdw to obtain remote estimates is to
> issue an EXPLAIN command to the remote server and then decipher the
> result. This has just one big advantage, which is that it works
> against existing, even very old, remote PG versions. In every other
> way it's pretty awful: it involves a lot of cycles on the far end
> to create output details we don't really care about, it requires a
> fair amount of logic to parse that output, and we can't get some
> details that we *do* care about (such as the total size of the foreign
> table, as per the other discussion).
>
> We can do better. I don't propose removing the existing logic, because
> being able to work against old remote PG versions seems pretty useful.
> But we could probe at connection start for whether the remote server
> has support for a better way, and then use that way if available.

I agree we can, and should, try to do better here.

> What should the better way look like? I suggest the following:
>
> * Rather than adding a core-server feature, the remote-end part of the new
> API should be a function installed by an extension (either postgres_fdw
> itself, or a new extension "postgres_fdw_remote" or the like). One
> attraction of this approach is that it'd be conceivable to back-port the
> new code into existing PG releases by updating the extension. Also
> there'd be room for multiple versions of the support. The
> connection-start probe could be of the form "does this function exist
> in pg_proc?".
>
> * I'm imagining the function being of the form
>
> function pg_catalog.postgres_fdw_support(query text) returns something
>
> where the input is still the text of a query we're considering issuing,
> and the output is some structure that contains the items of EXPLAIN-like
> data we need, but not the items we don't. The implementation of the
> function would run the query through parse/plan, then pick out the
> data we want and return that.
>
> * We could do a lot worse than to have the "structure" be JSON.
> This'd allow structured, labeled data to be returned; it would not be
> too difficult to construct, even in PG server versions predating the
> addition of JSON logic to the core; and the receiving postgres_fdw
> extension could use the core's JSON logic to parse the data.

I also tend to agree with using JSON for this.

> * The contents of the structure need to be designed with forethought
> for extensibility, but this doesn't seem hard if it's all a collection
> of labeled fields. We can just say that the recipient must ignore
> fields it doesn't recognize. Once a given field has been defined, we
> can't change its contents, but we can introduce new fields as needed.
> Note that I would not be in favor of putting an overall version number
> within the structure; that's way too coarse-grained.

This also makes sense to me.

> I'm not planning to do anything about these ideas myself, at least
> not in the short term. But perhaps somebody else would like to
> run with them.

I'm trying to figure out why it makes more sense to use
'postgres_fdw_support(query text)', which would still do parse/plan and
return EXPLAIN-like data, rather than having:

EXPLAIN (FORMAT JSON, FDW true) query ...

(Or, perhaps better, individual boolean options for whatever stuff we
want to ask for, or to exclude if we don't want it, so that other tools
could use this...).

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2020-07-05 17:15:58 Re: SQL/JSON: JSON_TABLE
Previous Message Dilip Kumar 2020-07-05 15:07:18 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions