Re: Import Statistics in postgres_fdw before resorting to sampling.

From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)postgresql(dot)org, jkatz(at)postgresql(dot)org, nathandbossart(at)gmail(dot)com
Subject: Re: Import Statistics in postgres_fdw before resorting to sampling.
Date: 2026-04-02 17:14:35
Message-ID: CADkLM=eWt0n8xpJkdNkeor7==wUcnzYh+rSEaFKpdngJk0ZwJg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
> The problem is not limited to this special case. Consider cases when
> 1) the remote table that has many rows are heavily updated after it
> got analyzed, and then 2) postgres_fdw imports its stats before it
> gets re-analyzed. The stats postgres_fdw imports would be stale,
> causing plan degradation. I don't think we should enable this feature
> by default until we guarantee stats freshness in some way.

So it seems like we have the following configurations desired by at least
somebody:

0. Row Sampling Only
1. Fetch stats and fall back to row sampling.
2. Always analyze remote table (assuming it is a table that can hold
stats), then fetch stats, and fall back if necessary.
3. Fetch stats, and if that turned up 0 attribute stats try an analyze,
then try to refetch and if it still fails go to row sampling.

With the following interpretation of reltuples = 0:

a. The table is definitively empty, stop.
b. The table is missing stats and running an analyze is cheap (assuming
remote analysis is even enabled)
c. if remote version >= 14 then a else b

I'm of the opinion that 3c is the best configuration for most tables, and
you have advocated for 1a without an analyze option and 2a with one. Option
2 seems a bit heavy handed to me, but I could see checking the remote
pg_stat_all_tables and making an analyze/no-analyze judgement call based on
that, perhaps call that analyze_stale_vacuum_interval or something like
that. That could be a neat feature for v20, and so whatever default we
choose for fetch_stats, I ask that we choose values that keep our options
open for all 4x3 configurations enumerated above.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2026-04-02 17:43:36 Re: pg_waldump: support decoding of WAL inside tarfile
Previous Message Tomas Vondra 2026-04-02 17:13:08 Re: pg_waldump: support decoding of WAL inside tarfile