| From: | Corey Huinker <corey(dot)huinker(at)gmail(dot)com> |
|---|---|
| To: | Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com> |
| Cc: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)postgresql(dot)org, jkatz(at)postgresql(dot)org, nathandbossart(at)gmail(dot)com |
| Subject: | Re: Import Statistics in postgres_fdw before resorting to sampling. |
| Date: | 2026-04-02 17:14:35 |
| Message-ID: | CADkLM=eWt0n8xpJkdNkeor7==wUcnzYh+rSEaFKpdngJk0ZwJg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
>
> The problem is not limited to this special case. Consider cases when
> 1) the remote table that has many rows are heavily updated after it
> got analyzed, and then 2) postgres_fdw imports its stats before it
> gets re-analyzed. The stats postgres_fdw imports would be stale,
> causing plan degradation. I don't think we should enable this feature
> by default until we guarantee stats freshness in some way.
So it seems like we have the following configurations desired by at least
somebody:
0. Row Sampling Only
1. Fetch stats and fall back to row sampling.
2. Always analyze remote table (assuming it is a table that can hold
stats), then fetch stats, and fall back if necessary.
3. Fetch stats, and if that turned up 0 attribute stats try an analyze,
then try to refetch and if it still fails go to row sampling.
With the following interpretation of reltuples = 0:
a. The table is definitively empty, stop.
b. The table is missing stats and running an analyze is cheap (assuming
remote analysis is even enabled)
c. if remote version >= 14 then a else b
I'm of the opinion that 3c is the best configuration for most tables, and
you have advocated for 1a without an analyze option and 2a with one. Option
2 seems a bit heavy handed to me, but I could see checking the remote
pg_stat_all_tables and making an analyze/no-analyze judgement call based on
that, perhaps call that analyze_stale_vacuum_interval or something like
that. That could be a neat feature for v20, and so whatever default we
choose for fetch_stats, I ask that we choose values that keep our options
open for all 4x3 configurations enumerated above.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2026-04-02 17:43:36 | Re: pg_waldump: support decoding of WAL inside tarfile |
| Previous Message | Tomas Vondra | 2026-04-02 17:13:08 | Re: pg_waldump: support decoding of WAL inside tarfile |