| From: | Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com> |
|---|---|
| To: | Corey Huinker <corey(dot)huinker(at)gmail(dot)com> |
| Cc: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)postgresql(dot)org, jkatz(at)postgresql(dot)org, nathandbossart(at)gmail(dot)com |
| Subject: | Re: Import Statistics in postgres_fdw before resorting to sampling. |
| Date: | 2026-04-03 13:55:54 |
| Message-ID: | CAPmGK14U+wiuuqJ8Qgd8P-aZ52XO=o2DyF-sRhVp8qaJ5HduAg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Fri, Apr 3, 2026 at 2:14 AM Corey Huinker <corey(dot)huinker(at)gmail(dot)com> wrote:
>>
>> The problem is not limited to this special case. Consider cases when
>> 1) the remote table that has many rows are heavily updated after it
>> got analyzed, and then 2) postgres_fdw imports its stats before it
>> gets re-analyzed. The stats postgres_fdw imports would be stale,
>> causing plan degradation. I don't think we should enable this feature
>> by default until we guarantee stats freshness in some way.
>
> So it seems like we have the following configurations desired by at least somebody:
>
> 0. Row Sampling Only
> 1. Fetch stats and fall back to row sampling.
> 2. Always analyze remote table (assuming it is a table that can hold stats), then fetch stats, and fall back if necessary.
> 3. Fetch stats, and if that turned up 0 attribute stats try an analyze, then try to refetch and if it still fails go to row sampling.
>
> With the following interpretation of reltuples = 0:
>
> a. The table is definitively empty, stop.
> b. The table is missing stats and running an analyze is cheap (assuming remote analysis is even enabled)
> c. if remote version >= 14 then a else b
>
> I'm of the opinion that 3c is the best configuration for most tables, and you have advocated for 1a without an analyze option and 2a with one. Option 2 seems a bit heavy handed to me, but I could see checking the remote pg_stat_all_tables and making an analyze/no-analyze judgement call based on that, perhaps call that analyze_stale_vacuum_interval or something like that. That could be a neat feature for v20, and so whatever default we choose for fetch_stats, I ask that we choose values that keep our options open for all 4x3 configurations enumerated above.
Before discussing this, let's wrap up the work for v19. Here is a new
version of the patch. Changes are:
* As above, I think we should disable this feature by default for now,
so I modified the patch as such.
* You are defining a remote query per version at the top of the file
that is used in fetch_attstats, but I think that that reduces
readability, making maintenance hard, so I modified that function to
build the query dynamically in it, as I proposed before. I also
modified that function to use PQsendQuery, not PQsendQueryParams, as
in fetch_relstats, for efficiency.
* I moved the code for opening/closing the connection from
postgresImportStatistics to fetch_remote_statistics, as the connection
is only used in the latter function. I also removed useless cleanup
processing in that function.
* I added regression tests.
* I fixed a minor bug I noticed while adding those tests.
* I did some cleanup and editorialization including re-ordering
functions in logical order.
One thing I'm wondering is: we should rename ImportStatistics to
ImportForeignStatistics, for consistency with ImportForeignSchema?
Also, we should rename fetch_stats to import_stats, for consistency
with eg, import_default?
Anyway I think the patch is now in good shape, except comments/docs,
which I will work on next.
Best regards,
Etsuro Fujita
| Attachment | Content-Type | Size |
|---|---|---|
| v19-Add-remote-statistics-fetching-to-postgres_fdw.patch | application/octet-stream | 53.5 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Imran Zaheer | 2026-04-03 13:56:01 | Remove commented-out code in 026_overwrite_contrecord.pl |
| Previous Message | Nathan Bossart | 2026-04-03 13:54:36 | Re: vectorized CRC on ARM64 |