Re: Import Statistics in postgres_fdw before resorting to sampling.

From: Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>
To: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)postgresql(dot)org, jkatz(at)postgresql(dot)org, nathandbossart(at)gmail(dot)com
Subject: Re: Import Statistics in postgres_fdw before resorting to sampling.
Date: 2026-03-30 09:32:38
Message-ID: CAPmGK17C==NbyJ59Z3e3hOSdQnUdCLonxLiWiQ5n_YFWYMEvow@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 18, 2026 at 10:57 AM Corey Huinker <corey(dot)huinker(at)gmail(dot)com> wrote:
> On Sat, Mar 14, 2026 at 4:05 PM Corey Huinker <corey(dot)huinker(at)gmail(dot)com> wrote:

>>> I've also moved the column matching - finding which rows of the attstats result set match to which columns in the local table - up into the fetch portion, something you had mentioned wanting to see as well. This resulted in some significant refactoring, but overall I think you will find the changes for the better.

>> I've added a reset of the result set indexes. Probably not necessary given that we only keep the result set if all the columns found homes, but good for peace of mind.

> And rebased to make the CI happy.

Thanks for updating patches!

I have been reviewing 0001 and 0002. Here are my comments/proposals I
have for now:

Core side:

* I proposed the StatisticsAreImportable callback function, but the
ImportStatistics callback function is now allowed to fall back to
analyzing, so I don't think StatisticsAreImportable has much value
anymore. I'll withdraw my proposal. Attached is an updated patch
removing it, in which 0001 is merged into 0002 for ease of review
(0001 can't be tested without 0002).

* I modified analyze.c to start progress monitor a bit earlier to
monitor the work of ImportStatistics (and AnalyzeForeignTable) as part
of initialization.

* I noticed this odd behavior:

create table t1 (id int, data text);
create foreign table ft1 (id int, data text) server loopback options
(table_name 't1');

alter foreign table ft1 options (add fetch_stats 'false');
analyze ft1 (ctid);
ERROR: column "ctid" of relation "ft1" does not exist

Looks good, but:

alter foreign table ft1 options (set fetch_stats 'true');
analyze ft1 (ctid);
NOTICE: remote table public.t1 has no statistics to import
HINT: Falling back to ANALYZE with regular row sampling.
ERROR: column "ctid" of relation "ft1" does not exist

postgres_fdw is doing the remote access without any need to do so,
which isn't good.

The cause is that the patches fail to validate the va_cols list given
to analyze_rel before importing stats. To fix, I modified analyze.c
to do so before calling ImportStatistics. I think this is also useful
when analyzing inheritance trees, as it avoids validating the list
repeatedly in that case.

* I separated a bit of code in examine_attribute that checks whether
the given column is analyzable into a new extern function so that FDWs
can use it.

postgres_fdw side:

* In fetch_remote_statistics, if we get reltuples=0 for v14 or later,
I think we should update only the relation stats with that info, and
avoid resorting to analyzing, for efficiency, as I proposed before. I
modified that function (and import_fetched_statistics) that way.

* I noticed that importing stats for a foreign table repeatedly like
this leads to incorrect attribute stats:

insert into t1 values ('aaa'), ('bbb'), ('ccc');
analyze t1;
analyze ft1;
select histogram_bounds from pg_stats where tablename = 'ft1';
histogram_bounds
------------------
{aaa,bbb,ccc}
(1 row)

delete from t1;
insert into t1 values ('foo'), ('foo'), ('foo');
analyze t1;
analyze ft1;
select histogram_bounds from pg_stats where tablename = 'ft1';
histogram_bounds
------------------
{aaa,bbb,ccc}
(1 row)

Old stats are remaining.

The root cause is that passing NULL to pg_restore_attribute_stats
leaves the stats unchanged. To fix, I modified
import_fetched_statistics to do pg_clear_attribute_stats before the
restore function.

* I don't think the error handling in import_fetched_statistics (and
match_attrmap) is safe:

+ plan = SPI_prepare(attimport_sql, ATTIMPORT_SQL_NUM_FIELDS,
+ (Oid *) attimport_argtypes);
+
+ if (plan == NULL || SPI_result < 0)
+ {
+ ereport(WARNING,
+ errmsg("import attribute statistics prepare failed %s "
+ "with error code %d",
+ attimport_sql, SPI_result));
+ goto import_cleanup;
+ }

In general, I think that if an error that should not happen happens,
continuing the processing isn't safe anymore; in such a case we should
throw an error to give the user a chance to decide what to do (eg,
retry after fixing issues or retry after setting the fetch_size option
to false if that's considered safe or ...). I think this is
consistent with the error handling of libpq functions we discussed
before. So I modified the handling in both functions to throw an
error in such cases.

* The matching logic in match_attrmap seems correct to me, but I think
it's still complicated than necessary: you are doing a merge join of
the RemoteAttributeMapping array and the attstats set by placing the
latter in the outer side, but it makes it simpler to do the join the
opposite way (ie, the former in the outer side), like the attached,
which doesn't need an inner for-loop anymore, as the inner side (ie,
the attstats set) consists of *distinct* columns.

* I moved table_has_extended_stats to extended_stats.c and renamed it
to match other functions in it so that other FDWs too can use it.

* The 0002 patch is using different levels (LOG, NOTICE, and WARNING)
for logging in postgresImportStatisitcs, but I think it's appropriate
to use elevel or WARNING for it. So I modified the patch to use
WARNING. I modified the wording as well to match analyze.c.

* I added at the top/end of postgresImportStatisitcs log messages that
show the start/end of importing stats.

That's it.

Best regards,
Etsuro Fujita

Attachment Content-Type Size
v17-Add-remote-statistics-fetching-to-postgres_fdw.patch application/octet-stream 49.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2026-03-30 09:42:09 Re: Import Statistics in postgres_fdw before resorting to sampling.
Previous Message Amit Kapila 2026-03-30 09:13:48 Re: Skipping schema changes in publication