Re: Import Statistics in postgres_fdw before resorting to sampling.

From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)postgresql(dot)org, jkatz(at)postgresql(dot)org, nathandbossart(at)gmail(dot)com
Subject: Re: Import Statistics in postgres_fdw before resorting to sampling.
Date: 2026-03-30 20:04:46
Message-ID: CADkLM=cEZYVP4_36ZdOT+rvj+pgA3LDrukiaczJTH3Pm+Yauug@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
> * I proposed the StatisticsAreImportable callback function, but the
> ImportStatistics callback function is now allowed to fall back to
> analyzing, so I don't think StatisticsAreImportable has much value
> anymore. I'll withdraw my proposal. Attached is an updated patch
> removing it, in which 0001 is merged into 0002 for ease of review
> (0001 can't be tested without 0002).
>

Thanks.

> * I modified analyze.c to start progress monitor a bit earlier to
> monitor the work of ImportStatistics (and AnalyzeForeignTable) as part
> of initialization.
>
> * I noticed this odd behavior:
>
> create table t1 (id int, data text);
> create foreign table ft1 (id int, data text) server loopback options
> (table_name 't1');
>
> alter foreign table ft1 options (add fetch_stats 'false');
> analyze ft1 (ctid);
> ERROR: column "ctid" of relation "ft1" does not exist
>
> Looks good, but:
>
> alter foreign table ft1 options (set fetch_stats 'true');
> analyze ft1 (ctid);
> NOTICE: remote table public.t1 has no statistics to import
> HINT: Falling back to ANALYZE with regular row sampling.
> ERROR: column "ctid" of relation "ft1" does not exist
>
> postgres_fdw is doing the remote access without any need to do so,
> which isn't good.
>
> The cause is that the patches fail to validate the va_cols list given
> to analyze_rel before importing stats. To fix, I modified analyze.c
> to do so before calling ImportStatistics. I think this is also useful
> when analyzing inheritance trees, as it avoids validating the list
> repeatedly in that case.
>

I see this change and I like it.

> * I separated a bit of code in examine_attribute that checks whether
> the given column is analyzable into a new extern function so that FDWs
> can use it.
>

Interesting. I wouldn't have dared make a change that affected regular
analyze, but it makes sense.

postgres_fdw side:
>
> * In fetch_remote_statistics, if we get reltuples=0 for v14 or later,
> I think we should update only the relation stats with that info, and
> avoid resorting to analyzing, for efficiency, as I proposed before. I
> modified that function (and import_fetched_statistics) that way.
>

This will miss out on the case where the remote table did get analyzed
once, when empty, but now isn't empty. I realize that shouldn't happen very
often, but the cost of rowsampling a table that is empty is very low.

> The root cause is that passing NULL to pg_restore_attribute_stats
> leaves the stats unchanged. To fix, I modified
> import_fetched_statistics to do pg_clear_attribute_stats before the
> restore function.
>

+1

> * The matching logic in match_attrmap seems correct to me, but I think
> it's still complicated than necessary: you are doing a merge join of
> the RemoteAttributeMapping array and the attstats set by placing the
> latter in the outer side, but it makes it simpler to do the join the
> opposite way (ie, the former in the outer side), like the attached,
> which doesn't need an inner for-loop anymore, as the inner side (ie,
> the attstats set) consists of *distinct* columns.
>

The new match_attrmap seems reasonable.

> * I moved table_has_extended_stats to extended_stats.c and renamed it
> to match other functions in it so that other FDWs too can use it.
>

+1

> * The 0002 patch is using different levels (LOG, NOTICE, and WARNING)
> for logging in postgresImportStatisitcs, but I think it's appropriate
> to use elevel or WARNING for it. So I modified the patch to use
> WARNING. I modified the wording as well to match analyze.c.
>

+1

> * I added at the top/end of postgresImportStatisitcs log messages that
> show the start/end of importing stats.
>

+1

> That's it.
>

I see that remote_analyze didn't make it as a part of this patch. Is that
something you'd repackaged as a follow-on patch, or are you just done with
it?

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Corey Huinker 2026-03-30 20:05:43 Re: Import Statistics in postgres_fdw before resorting to sampling.
Previous Message Sami Imseih 2026-03-30 19:20:47 Re: Refactor query normalization into core query jumbling