Re: Import Statistics in postgres_fdw before resorting to sampling.

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Cc: Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)postgresql(dot)org, jkatz(at)postgresql(dot)org, nathandbossart(at)gmail(dot)com
Subject: Re: Import Statistics in postgres_fdw before resorting to sampling.
Date: 2026-01-29 11:03:00
Message-ID: CAExHW5tv__HRgGHj2nOA1NBc69ow0z7VBzZ9qw4d_Pww6icBpA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 27, 2026 at 1:35 PM Corey Huinker <corey(dot)huinker(at)gmail(dot)com> wrote:
>>
>> > I'm not sure we can actually do that. The functions that compute the statistics are all based off of row samples, not already computed statistics. I don't think we can synthesize a rowsample from the imported statistics, at least not accurately. If I'm misunderstanding what you're suggesting, please correct me.
>>
>> I am comparing the calculation of statistics to the calculation of
>> aggregates. We have code to compute aggregates on a partitioned table
>> from the partial aggregates computed from the individual partitions.
>> (Even though I am mentioning the partitioned table, the technique can
>> be used for an inheritance hierarchy.) Similarly if we could come up
>>
>> with a representation of partial statistics, we could get partial
>> statistics computed for the children (and the parent in
>> non-partitioned inheritance). Use the partial statistics to compute
>> the statistics for the parent without the need to synthesize row
>> samples from the children. I haven't looked at all the kinds of
>> statistics to see whether this is feasible.
>
>
> We're limited to the existing data from pg_class and the security-barrier view pg_stats. We also have pg_stats_ext and pg_stats_ext_exprs as well, but those are for extended stats objects, which aren't useful to us in this context.
>
> I've been a part of some research into the feasibility of caching the row samples fetched, allowing the planner to generate on-the-fly statistics for OLAP queries. If we ever got that functionality, we'd need a means of exposing those row samples externally, and even then we'd have to wait for the remote postgresql server to have that feature. So for now we are limited to what pg_class and pg_stats tell us.

The way this is implemented, it will favour the usecases where foreign
tables are not child tables. That leaves out the sharding use case
which I believe is also a significant usecase. I think we need to
think, how can we make that usecase benefit from this optimization. In
case of sharding the foreign tables will be child tables pointing to
very large tables on the remote side or at least the parent table they
are part of will have very large data spread across multiple remotes.
Not being able to use statistics available on the remote side seems a
major limitation. But I don't have a better solution than to think of
supporting some kind of partial statistics.

--
Best Wishes,
Ashutosh Bapat

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ajin Cherian 2026-01-29 11:13:21 Re: Fix logical decoding not track transaction during SNAPBUILD_BUILDING_SNAPSHOT
Previous Message Bernd Helmle 2026-01-29 10:05:53 Re: [PATCH] Add max_logical_replication_slots GUC