Re: Import Statistics in postgres_fdw before resorting to sampling.

From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)postgresql(dot)org, jkatz(at)postgresql(dot)org, nathandbossart(at)gmail(dot)com
Subject: Re: Import Statistics in postgres_fdw before resorting to sampling.
Date: 2026-01-29 19:20:26
Message-ID: CADkLM=e0z4zyqqNVM8UU+frmY5ca1E1VWAL7Tbmmcgn9rCtD3w@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
>
> The way this is implemented, it will favour the usecases where foreign
> tables are not child tables.

It is true that this feature does not benefit the recursive
do_analyze_rel() case. But it does help when those same tables are analyzed
directly.

> That leaves out the sharding use case
> which I believe is also a significant usecase. I think we need to
> think, how can we make that usecase benefit from this optimization.

I agree that we should find a way to do that, but this handles the other
case, and doesn't prevent us from later teaching
postgresAnalyzeForeignTable() to use cache the rowsample locally for later
use, which postgresImportStatistics() could then consider the relative
benefits of using that local cached sample vs the already formed remote
statistics. Even in that case, I'm guessing that the remote table's stats
will be based on a larger and therefore better sample size then the sample
we are able to pull across the wire and cache locally, so the remotely
computed statistics would be better.

Not being able to use statistics available on the remote side seems a
> major limitation. But I don't have a better solution than to think of
> supporting some kind of partial statistics.

I'm not against trying to fetch and cache rowsamples, or cache some
partially aggregated results of a rowsample, but this patch does not cover
that. This patch should, at least in theory, reduce the number of table
samples pulled across the wire by 50% and that seems worthwhile.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message ocean_li_996 2026-01-29 19:27:03 Re: Fix logical decoding not track transaction during SNAPBUILD_BUILDING_SNAPSHOT
Previous Message Sami Imseih 2026-01-29 18:54:30 Re: Optional skipping of unchanged relations during ANALYZE?