Re: use of SPI by postgresImportForeignStatistics

From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: use of SPI by postgresImportForeignStatistics
Date: 2026-06-22 19:08:33
Message-ID: CADkLM=dWJaw04u9FU=WkykOy7bu7B5eKmoDkfVz6JXh7mdemkw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jun 20, 2026 at 12:11 AM Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
wrote:

> On Fri, Jun 19, 2026 at 9:45 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
>> On Thu, Jun 18, 2026 at 5:23 PM Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
>> wrote:
>> > To be clear, this solves the SPI problem, but questions about the
>> design of attribute_statistics_update() and relation_statistics_update()
>> remain, but those concerns are now isolated within their respective files
>> attribute_stats.c and relation_stats.c. The inefficiencies therein aren't
>> really in a critical path, so if we wanted to leave them be until v20 they
>> could, but if time allows I'd at least like try unwinding some of that. But
>> first, let's get SPI out of postgres_fdw.c.
>>
>> I think that's the right order of priority, but I don't think that
>> having import_attribute_statistics construct an fcinfo is great.
>>
>
> Me either, I'm looking at "phase 2" already where
> relation/attribute_statistics_update becomes a conventional function, and
> pg_clear_attribute_stats and pg_restore_attribute_stats (and their relstats
> equivalents) marshal their parameters to call that instead.
>
>
>> Ideally, attribute_statistics_update would call
>> import_attribute_statistics rather than the other way around.
>>
>
> I think we're mostly on the same page. If we require the caller to
> understand the stats data structures to a greater level of detail, and
> require it to do the transformations to the proper input types
> (BlockNumbers, floats, float arrays, cstrings for anyarray, etc), then the
> import_attribute_statistics functon and the new attribute_statistics_update
> would be one-in-the-same. The v1 patch leaned heavily (perhaps too far)
> towards letting the caller pass along string values fetched via
> PQgetvalue() from a pg_stats without modification.
>

An update on my progress on Phase 2:

I was able to convert relation_statistics_update() with relatively little
fuss, but ran into trouble in doing so for attribute_statistics_update().

Initially the plan was to have a data structure
RelationStatsData/AttributeStatsData which contained a series of boolean
has_FOO flags alongside FOO of the actual stat type (int32, float, float[],
or cstring for anyarray value) and have the respective functions convert
their parameters to this common structure before calling the common
function.

The first bit of dissonance comes from the SQL-level functions having
schemaname+relname parameters, and the reloid is resolved via
RangeVarGetRelidExtended() which has a callback to check for correct
permissions and setting the proper lock level. However, the C-caller would
either have the reloid already, or an already open Relation, but no
assurance that the caller has the correct permissions for that table or the
correct lock level on the table. So either we make an equivalent to
RangeVarGetRelidExtended() that takes an oid, or the C-caller has to derive
a RangeVar, call the existing RangeVarGetRelidExtended() function, and
verify the result reloid against the supplied parameter. I went with
deriving the RangeVar and putting an Assert on the before/after reloids,
but perhaps the smarter play is to make a function that checks for
ShareUpdateExclusiveLock on the Relation, and then does the equivalent of
RangeVarCallbackForStats().

A smaller bit of dissonance was with RecoveryInProgress(), which if I
recall we're checking before RangeVarGetRelidExtended() to avoid trying to
take a lock that will fail. That check might not be meaningful if the C
call takes a relation, thus ensuring that some level of locking worked,
thus we aren't in recovery.

Next is the existing validation functions stats_check_required_arg(),
stats_check_arg_array(), and stats_check_arg_pairs() all work on values
indexed by the positional functioncallinfo and the corresponding
relstatsinfo/attstatsinfo structure, and this makes a lot of type checking
and value checking compact and uniform. If we want to keep this sort of
uniformity, the resulting StatsData structure will end up looking a lot
like the FunctionCallInfo that we already had.

Next is the fact that the end-destination for every value passed in is a
Datum for a pg_statistic heaptuple. Most Datum values are checked only for
their null-ness and if they're the correct type, so the value itself is
usually just passed directly from fcinfo into the heap tuple values[]
array. The float[] values are checked for number of elements and whether
any elements are NULL, but that is done via array functions that take a
Datum input. Only in a few cases do we actually look at the actual internal
value of the Datum (reltuples, attname, attnum, the anyarrays), so there's
little to gain there.

There's some additional hassle in the fact that
pg_restore_attribute_stats() can take an attnum parameter OR an attname
parameter, but not both. I was able to resolve that in a semi-elegant
fashion, but the other issues have convinced me that we're probably better
off continuing to use the FunctionCallInfo version of
attribute_stats_update(), though perhaps with a different name, allowing us
to use that name for the new API call instead of
import_attribute_statistics().

One thing we *do* need to change from my v1 patch is moving the recovery
check and the RangeVar check out of relation_statistics_update() and
attribute_statsistics_update(), and having the respective callers do those
checks themselves first, passing in the now-authoritative reloid from
stats_acquire_relation_lock().

To that end, here's a new and rebased patch set:

0001 - exactly the same as before
0002 - exactly the same as before
0003 - New stat_utils.c function stats_acquire_relation_lock() which covers
the RangeVar and Recovery checks common to all the functions that modify
relation or attribute stats.
0004 - Add a small regression test to relation stats
0005 - Use stats_acquire_relation_lock() in relstats functions, and rename
the publicly-facing relstats C function.
0006 - Rename the publicly-facing attstats functions.
0007 - Use stats_acquire_relation_lock() in attstats functions.

Attachment Content-Type Size
v2-0001-Remove-SPI-in-favor-of-import_relation_statistics.patch text/x-patch 9.8 KB
v2-0002-Remove-SPI-in-favor-of-import_attribute_statistic.patch text/x-patch 22.9 KB
v2-0003-New-function-stats_acquire_relation_lock.patch text/x-patch 2.9 KB
v2-0005-Rename-import_relation_statistics-to-relation_sta.patch text/x-patch 11.7 KB
v2-0004-postgres_fdw-additional-regression-reverse-set-di.patch text/x-patch 2.2 KB
v2-0006-Rename-attribute_statistics-functions.patch text/x-patch 4.6 KB
v2-0007-Make-callers-of-update_attstats-use-stats_acquire.patch text/x-patch 9.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2026-06-22 19:36:09 Re: Allow pg_log_backend_memory_contexts() for postmaster
Previous Message Tomas Vondra 2026-06-22 19:03:33 Re: Is there value in having optimizer stats for joins/foreignkeys?