Re: Import Statistics in postgres_fdw before resorting to sampling.

From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)postgresql(dot)org, jkatz(at)postgresql(dot)org, nathandbossart(at)gmail(dot)com
Subject: Re: Import Statistics in postgres_fdw before resorting to sampling.
Date: 2026-01-27 08:04:52
Message-ID: CADkLM=fxent=ZQG9SUo8VopQFL2Hc+n0EkioQK_63biYGW55zQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
> > I'm not sure we can actually do that. The functions that compute the
> statistics are all based off of row samples, not already computed
> statistics. I don't think we can synthesize a rowsample from the imported
> statistics, at least not accurately. If I'm misunderstanding what you're
> suggesting, please correct me.
>
> I am comparing the calculation of statistics to the calculation of
> aggregates. We have code to compute aggregates on a partitioned table
> from the partial aggregates computed from the individual partitions.
> (Even though I am mentioning the partitioned table, the technique can
> be used for an inheritance hierarchy.) Similarly if we could come up

with a representation of partial statistics, we could get partial
> statistics computed for the children (and the parent in
> non-partitioned inheritance). Use the partial statistics to compute
> the statistics for the parent without the need to synthesize row
> samples from the children. I haven't looked at all the kinds of
> statistics to see whether this is feasible.
>

We're limited to the existing data from pg_class and the security-barrier
view pg_stats. We also have pg_stats_ext and pg_stats_ext_exprs as well,
but those are for extended stats objects, which aren't useful to us in this
context.

I've been a part of some research into the feasibility of caching the row
samples fetched, allowing the planner to generate on-the-fly statistics for
OLAP queries. If we ever got that functionality, we'd need a means of
exposing those row samples externally, and even then we'd have to wait for
the remote postgresql server to have that feature. So for now we are
limited to what pg_class and pg_stats tell us.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ajit Awekar 2026-01-27 08:19:34 Re: Periodic authorization expiration checks using GoAway message
Previous Message Florents Tselai 2026-01-27 08:01:48 Re: Add SQL/JSON ON MISMATCH clause to JSON_VALUE