Re: [PATCH] Add hook for plugins to acquire sample rows during ANALYZE

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: Samba Siva Reddy Chinta <sambasivareddy(dot)ch(at)zohomail(dot)in>
Cc: pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Add hook for plugins to acquire sample rows during ANALYZE
Date: 2026-06-26 12:41:50
Message-ID: CAExHW5tgK7osrZqe=tx+fKW1JY9pMQ6GzeubSV3PADYx6rpmkw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Samba Siva,

On Fri, Jun 26, 2026 at 7:17 AM Samba Siva Reddy Chinta
<sambasivareddy(dot)ch(at)zohomail(dot)in> wrote:
>
> Hi all,
>
> Attached is a patch that adds a hook, AcquireSampleRowsFunc_hook, allowing extensions to override the row sampling function used during ANALYZE for regular heap relations.
>
> Motivation
>
> Extensions that implement horizontal scaling of tables currently have no clean way to participate in ANALYZE's row sampling. The default acquire_sample_rows() only knows how to sample the local heap, so a distributed-table extension wanting accurate statistics has to either:
>
> maintain its own separate stats-collection machinery outside of ANALYZE entirely, or
> duplicate/reimplement parts of analyze.c's sampling logic to pull rows from remote nodes.
>
> This hook lets such an extension plug into the existing ANALYZE code path and supply its own row acquisition function, without having to reinvent stats collection or duplicate logic that already exists in core.
>
> What the patch does
>
> Adds AcquireSampleRowsFunc_hook (typed identically to AcquireSampleRowsFunc) in vacuum.h.
> In analyze.c, both analyze_rel() and acquire_inherited_sample_rows() check the hook and use it in place of acquire_sample_rows() when set.
> Adds doc text in xfunc.sgml describing the hook's contract (fill rows[] up to targrows, set *totalrows).
> Adds a regression test confirming ANALYZE still completes normally with the hook unset (the hook itself needs a C extension to exercise meaningfully, so this just guards against regressions in the unset case).
>

If you have implemented sharding using inheritance or partitioning and
FDW, the facility to fetch statistics of foreign tables from foreign
server can be useful here. IIUC, there is a limitation on using it for
inherited foreign tables, but it's worth exploring and improving for
your usecase.

--
Best Wishes,
Ashutosh Bapat

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2026-06-26 12:50:56 Re: RFC: Logging plan of the running query
Previous Message Shlok Kyal 2026-06-26 12:38:27 Re: Support EXCEPT for ALL SEQUENCES publications