Re: Import Statistics in postgres_fdw before resorting to sampling.

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Import Statistics in postgres_fdw before resorting to sampling.
Date: 2025-08-14 07:37:58
Message-ID: CAExHW5seoMNjHiZge0O=p-oF6PGsY8om4VjpzTCBsaASXZC6Fw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 12, 2025 at 10:33 PM Corey Huinker <corey(dot)huinker(at)gmail(dot)com> wrote:
>
>
> Attached is my current work on adding remote fetching of statistics to postgres_fdw, and opening the possibility of doing so to other foreign data wrappers.
>
> This involves adding two new options to postgres_fdw at the server and table level.
>
> The first option, fetch_stats, defaults to true at both levels. If enabled, it will cause an ANALYZE of a postgres_fdw foreign table to first attempt to fetch relation and attribute statistics from the remote table. If this succeeds, then those statistics are imported into the local foreign table, allowing us to skip a potentially expensive sampling operation.
>
> The second option, remote_analyze, defaults to false at both levels, and only comes into play if the first fetch succeeds but no attribute statistics (i.e. the stats from pg_stats) are found. If enabled then the function will attempt to ANALYZE the remote table, and if that is successful then a second attempt at fetching remote statistics will be made.
>
> If no statistics were fetched, then the operation will fall back to the normal sampling operation per settings.
>
> Note patches 0001 and 0002 are already a part of a separate thread https://www.postgresql.org/message-id/flat/CADkLM%3DcpUiJ3QF7aUthTvaVMmgQcm7QqZBRMDLhBRTR%2BgJX-Og%40mail.gmail.com regarding a bug (0001) and a nitpick (0002) that came about as a side-effect to this effort, and but I expect those to be resolved one way or another soon. Any feedback on those two can be handled there.

I think this is very useful to avoid fetching rows from foreign server
and analyzing them locally.

This isn't a full review. I looked at the patches mainly to find out
how does it fit into the current method of analysing a foreign table.
Right now, do_analyze_rel() is called with FDW specific acquireFunc,
which collects a sample of rows. The sample is passed to attribute
specific compute_stats which fills VacAttrStats for that attribute.
VacAttrStats for all the attributes is passed to update_attstats(),
which updates pg_statistics. The patch changes that to fetch the
statistics from the foreign server and call pg_restore_attribute_stats
for each attribute. Instead I was expecting that after fetching the
stats from the foreign server, it would construct VacAttrStats and
call update_attstats(). That might be marginally faster since it
avoids SPI call and updates stats for all the attributes. Did you
consider this alternate approach and why it was discarded?

If a foreign table points to an inheritance parent on the foreign
server, we will receive two rows for that table - one with inherited =
false and other with true in that order. I think the stats with
inherited=true are relevant to the local server since querying the
parent will fetch rows from children. Since that stats is applied
last, the pg_statistics will retain the intended statistics. But why
to fetch two rows in the first place and waste computing cycles?

--
Best Wishes,
Ashutosh Bapat

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2025-08-14 08:18:44 Re: pg_stat_statements: Add `calls_aborted` counter for tracking query cancellations
Previous Message Nazir Bilal Yavuz 2025-08-14 07:37:09 Re: meson: Add _static and _shared suffixes to the library names