| From: | Corey Huinker <corey(dot)huinker(at)gmail(dot)com> |
|---|---|
| To: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> |
| Cc: | Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)postgresql(dot)org, jkatz(at)postgresql(dot)org, nathandbossart(at)gmail(dot)com |
| Subject: | Re: Import Statistics in postgres_fdw before resorting to sampling. |
| Date: | 2026-02-12 14:29:34 |
| Message-ID: | CADkLM=cU1YW4yeW-osNGLkhWQp+p6bt0MYUizYE-Vw87pG-igg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Thu, Jan 29, 2026 at 2:20 PM Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
wrote:
>
>> The way this is implemented, it will favour the usecases where foreign
>> tables are not child tables.
>
>
> It is true that this feature does not benefit the recursive
> do_analyze_rel() case. But it does help when those same tables are analyzed
> directly.
>
>
>> That leaves out the sharding use case
>> which I believe is also a significant usecase. I think we need to
>> think, how can we make that usecase benefit from this optimization.
>
>
> I agree that we should find a way to do that, but this handles the other
> case, and doesn't prevent us from later teaching
> postgresAnalyzeForeignTable() to use cache the rowsample locally for later
> use, which postgresImportStatistics() could then consider the relative
> benefits of using that local cached sample vs the already formed remote
> statistics. Even in that case, I'm guessing that the remote table's stats
> will be based on a larger and therefore better sample size then the sample
> we are able to pull across the wire and cache locally, so the remotely
> computed statistics would be better.
>
> Not being able to use statistics available on the remote side seems a
>> major limitation. But I don't have a better solution than to think of
>> supporting some kind of partial statistics.
>
>
> I'm not against trying to fetch and cache rowsamples, or cache some
> partially aggregated results of a rowsample, but this patch does not cover
> that. This patch should, at least in theory, reduce the number of table
> samples pulled across the wire by 50% and that seems worthwhile.
>
>
Rebase with some error message cleanups.
| Attachment | Content-Type | Size |
|---|---|---|
| v13-0001-Add-FDW-functions-for-importing-optimizer-statis.patch | text/x-patch | 5.0 KB |
| v13-0002-Add-remote-statistics-fetching-to-postgres_fdw.patch | text/x-patch | 36.9 KB |
| v13-0003-Add-remote_analyze-to-postgres_fdw-remote-statis.patch | text/x-patch | 10.8 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tatsuya Kawata | 2026-02-12 14:33:55 | Re: [PATCH] Add sampling statistics to autoanalyze log output |
| Previous Message | Heikki Linnakangas | 2026-02-12 14:21:21 | Re: pgsql: Introduce pg_shmem_allocations_numa view |