Re: Ideas about a better API for postgres_fdw remote estimates

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)2ndquadrant(dot)com>
Subject: Re: Ideas about a better API for postgres_fdw remote estimates
Date: 2020-08-31 13:19:21
Message-ID: CAExHW5uqKTfjC0etXexFZUBdBwRfJuhsUd=Gm58Hx6OOFEDrVA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 31, 2020 at 3:36 PM Andrey V. Lepikhov
<a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
>
> Thanks for this helpful feedback.
>
> I found several threads related to the problem [1-3].
> I agreed that this task needs to implement an API for
> serialization/deserialization of statistics:
> pg_load_relation_statistics(json_string text);
> pg_get_relation_statistics(relname text);
> We can use a version number for resolving conflicts with different
> statistics implementations.
> "Load" function will validate the values[] anyarray while deserializing
> the input json string to the datatype of the relation column.
>

This is a valuable feature. Analysing a foreign table by fetching rows
from the foreign server isn't very efficient. In fact the current FDW
API for doing that forges that in-efficiency by requiring the FDW to
return a sample of rows that will be analysed by the core. That's why
I see that your patch introduces a new API to get foreign rel stat. I
don't think there's any point in maintaining these two APIs just for
ANALYSING table. Instead we should have only one FDW API which will do
whatever it wants and return statistics that can be understood by the
core and let core install it in the catalogs. I believe that's doable.

In case of PostgreSQL it could get the stats available as is from the
foreign server, convert it into a form that the core understands and
returns. The patch introduces a new function postgres_fdw_stat() which
will be available only from version 14 onwards. Can we use
row_to_json(), which is available in all the supported versions,
instead?

In case of some other foreign server, an FDW will be responsible to
return statistics in a form that the core will understand. It may
fetch rows from the foreign server or be a bit smart and fetch the
statistics and convert.

This also means that FDWs will have to deal with the statistics format
that the core understands and thus will need changes in their code
with every version in the worst case. But AFAIR, PostgreSQL supports
different forms of statistics so the problem may not remain that
severe if FDWs and core agree on some bare minimum format that the
core supports for long.

I think the patch has some other problems like it works only for
regular tables on foreign server but a foreign table can be pointing
to any relation like a materialized view, partitioned table or a
foreign table on the foreign server all of which have statistics
associated with them. I didn't look closely but it does not consider
that the foreign table may not have all the columns from the relation
on the foreign server or may have different names. But I think those
problems are kind of secondary. We have to agree on the design first.

--
Best Wishes,
Ashutosh Bapat

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-08-31 13:28:43 Re: Use T_IntList for uint32
Previous Message Ashutosh Bapat 2020-08-31 12:14:03 Re: Use T_IntList for uint32