| From: | Corey Huinker <corey(dot)huinker(at)gmail(dot)com> |
|---|---|
| To: | Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com> |
| Cc: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)postgresql(dot)org, jkatz(at)postgresql(dot)org, nathandbossart(at)gmail(dot)com |
| Subject: | Re: Import Statistics in postgres_fdw before resorting to sampling. |
| Date: | 2026-03-30 20:04:46 |
| Message-ID: | CADkLM=cEZYVP4_36ZdOT+rvj+pgA3LDrukiaczJTH3Pm+Yauug@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
>
> * I proposed the StatisticsAreImportable callback function, but the
> ImportStatistics callback function is now allowed to fall back to
> analyzing, so I don't think StatisticsAreImportable has much value
> anymore. I'll withdraw my proposal. Attached is an updated patch
> removing it, in which 0001 is merged into 0002 for ease of review
> (0001 can't be tested without 0002).
>
Thanks.
> * I modified analyze.c to start progress monitor a bit earlier to
> monitor the work of ImportStatistics (and AnalyzeForeignTable) as part
> of initialization.
>
> * I noticed this odd behavior:
>
> create table t1 (id int, data text);
> create foreign table ft1 (id int, data text) server loopback options
> (table_name 't1');
>
> alter foreign table ft1 options (add fetch_stats 'false');
> analyze ft1 (ctid);
> ERROR: column "ctid" of relation "ft1" does not exist
>
> Looks good, but:
>
> alter foreign table ft1 options (set fetch_stats 'true');
> analyze ft1 (ctid);
> NOTICE: remote table public.t1 has no statistics to import
> HINT: Falling back to ANALYZE with regular row sampling.
> ERROR: column "ctid" of relation "ft1" does not exist
>
> postgres_fdw is doing the remote access without any need to do so,
> which isn't good.
>
> The cause is that the patches fail to validate the va_cols list given
> to analyze_rel before importing stats. To fix, I modified analyze.c
> to do so before calling ImportStatistics. I think this is also useful
> when analyzing inheritance trees, as it avoids validating the list
> repeatedly in that case.
>
I see this change and I like it.
> * I separated a bit of code in examine_attribute that checks whether
> the given column is analyzable into a new extern function so that FDWs
> can use it.
>
Interesting. I wouldn't have dared make a change that affected regular
analyze, but it makes sense.
postgres_fdw side:
>
> * In fetch_remote_statistics, if we get reltuples=0 for v14 or later,
> I think we should update only the relation stats with that info, and
> avoid resorting to analyzing, for efficiency, as I proposed before. I
> modified that function (and import_fetched_statistics) that way.
>
This will miss out on the case where the remote table did get analyzed
once, when empty, but now isn't empty. I realize that shouldn't happen very
often, but the cost of rowsampling a table that is empty is very low.
> The root cause is that passing NULL to pg_restore_attribute_stats
> leaves the stats unchanged. To fix, I modified
> import_fetched_statistics to do pg_clear_attribute_stats before the
> restore function.
>
+1
> * The matching logic in match_attrmap seems correct to me, but I think
> it's still complicated than necessary: you are doing a merge join of
> the RemoteAttributeMapping array and the attstats set by placing the
> latter in the outer side, but it makes it simpler to do the join the
> opposite way (ie, the former in the outer side), like the attached,
> which doesn't need an inner for-loop anymore, as the inner side (ie,
> the attstats set) consists of *distinct* columns.
>
The new match_attrmap seems reasonable.
> * I moved table_has_extended_stats to extended_stats.c and renamed it
> to match other functions in it so that other FDWs too can use it.
>
+1
> * The 0002 patch is using different levels (LOG, NOTICE, and WARNING)
> for logging in postgresImportStatisitcs, but I think it's appropriate
> to use elevel or WARNING for it. So I modified the patch to use
> WARNING. I modified the wording as well to match analyze.c.
>
+1
> * I added at the top/end of postgresImportStatisitcs log messages that
> show the start/end of importing stats.
>
+1
> That's it.
>
I see that remote_analyze didn't make it as a part of this patch. Is that
something you'd repackaged as a follow-on patch, or are you just done with
it?
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Corey Huinker | 2026-03-30 20:05:43 | Re: Import Statistics in postgres_fdw before resorting to sampling. |
| Previous Message | Sami Imseih | 2026-03-30 19:20:47 | Re: Refactor query normalization into core query jumbling |