Re: Ideas about a better API for postgres_fdw remote estimates

From: Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)2ndquadrant(dot)com>
Subject: Re: Ideas about a better API for postgres_fdw remote estimates
Date: 2020-09-01 04:47:46
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 8/31/20 6:19 PM, Ashutosh Bapat wrote:
> On Mon, Aug 31, 2020 at 3:36 PM Andrey V. Lepikhov
> <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
>> I agreed that this task needs to implement an API for
>> serialization/deserialization of statistics:
>> pg_load_relation_statistics(json_string text);
>> pg_get_relation_statistics(relname text);
>> We can use a version number for resolving conflicts with different
>> statistics implementations.
>> "Load" function will validate the values[] anyarray while deserializing
>> the input json string to the datatype of the relation column.
> This is a valuable feature. Analysing a foreign table by fetching rows
> from the foreign server isn't very efficient. In fact the current FDW
> API for doing that forges that in-efficiency by requiring the FDW to
> return a sample of rows that will be analysed by the core. That's why
> I see that your patch introduces a new API to get foreign rel stat. I
> don't think there's any point in maintaining these two APIs just for

> ANALYSING table. Instead we should have only one FDW API which will do
> whatever it wants and return statistics that can be understood by the
> core and let core install it in the catalogs. I believe that's doable.
I think the same.
> In case of PostgreSQL it could get the stats available as is from the
> foreign server, convert it into a form that the core understands and
> returns. The patch introduces a new function postgres_fdw_stat() which
> will be available only from version 14 onwards. Can we use
> row_to_json(), which is available in all the supported versions,
> instead?
I started from here. But we need to convert starelid, staop[] stacoll[]
oids into portable format. Also we need to explicitly specify the type
of each values[] array. And no one guaranteed that anyarray values[]
can't contained an array of complex type values, containing oids, that
can't be correctly converted to database objects on another server...
These considerations required me to add new postgres_fdw_stat() routine
that can be moved into the core.
> In case of some other foreign server, an FDW will be responsible to
> return statistics in a form that the core will understand. It may
> fetch rows from the foreign server or be a bit smart and fetch the
> statistics and convert.
I don't think I fully understood your idea. Please explain in more
detail if possible.
> This also means that FDWs will have to deal with the statistics format
> that the core understands and thus will need changes in their code
> with every version in the worst case. But AFAIR, PostgreSQL supports
> different forms of statistics so the problem may not remain that
> severe if FDWs and core agree on some bare minimum format that the
> core supports for long.
I don't think FDW needs to know anything about the internals of
statistics. It only need to execute query like
"SELECT extract_statistics(namespace.relation);"
and apply the text representation by the function call like this:
store_statistics(const char *stat);
All validation and update pg_statistic operations will be performed into
the core.
> I think the patch has some other problems like it works only for
> regular tables on foreign server but a foreign table can be pointing
> to any relation like a materialized view, partitioned table or a
> foreign table on the foreign server all of which have statistics
> associated with them.
Ok. It was implemented for discussion, test and as a base of development.

> I didn't look closely but it does not consider
> that the foreign table may not have all the columns from the relation
> on the foreign server or may have different names.
Here we get full statistics from remote server and extract statistics
only for columns, included into the tuple descriptor of foreign table.

> But I think those
> problems are kind of secondary. We have to agree on the design first.
I only want to point out the following. In previous threads statistics
was converted row-by-row. I want to suggest to serialize all statistics
tuples for the relation into single json string. On apply phase we can
filter unneeded attributes.

Andrey Lepikhov
Postgres Professional

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2020-09-01 04:59:25 Re: "cert" + clientcert=verify-ca in pg_hba.conf?
Previous Message Michael Paquier 2020-09-01 04:41:04 Re: Include access method in listTables output