Re: use of SPI by postgresImportForeignStatistics

From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: use of SPI by postgresImportForeignStatistics
Date: 2026-06-23 15:53:22
Message-ID: CADkLM=cdKjgphHpFW8kKukHF+DVYKogHcqLXO=BA8MGfiRFqQA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 23, 2026 at 8:11 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Mon, Jun 22, 2026 at 3:08 PM Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
> wrote:
> > The first bit of dissonance comes from the SQL-level functions having
> schemaname+relname parameters, and the reloid is resolved via
> RangeVarGetRelidExtended() which has a callback to check for correct
> permissions and setting the proper lock level. However, the C-caller would
> either have the reloid already, or an already open Relation, but no
> assurance that the caller has the correct permissions for that table or the
> correct lock level on the table. So either we make an equivalent to
> RangeVarGetRelidExtended() that takes an oid, or the C-caller has to derive
> a RangeVar, call the existing RangeVarGetRelidExtended() function, and
> verify the result reloid against the supplied parameter. I went with
> deriving the RangeVar and putting an Assert on the before/after reloids,
> but perhaps the smarter play is to make a function that checks for
> ShareUpdateExclusiveLock on the Relation, and then does the equivalent of
> RangeVarCallbackForStats().
>
> I don't think this is really a problem. If the caller is specifying
> the OID, they should have called RangeVarGetRelidExtended themselves.
> Permissions-checking, locking, and opening the relation should all
> happen simultaneously, and the logic shouldn't be duplicated later.
>

It seems dangerous to me to provide an externally visible function that
assumes the caller has already self-verified that they have permission to
modify stats for an object.

It's possible that the existing analyze() code already has verified these
permissions, but at this moment I'm unsure.

> > A smaller bit of dissonance was with RecoveryInProgress(), which if I
> recall we're checking before RangeVarGetRelidExtended() to avoid trying to
> take a lock that will fail. That check might not be meaningful if the C
> call takes a relation, thus ensuring that some level of locking worked,
> thus we aren't in recovery.
>
> I don't quite follow this part.
>

I mean that I don't know if that check is required in situations where the
caller already has an open relation with the right lock, and if we can
avoid the call by having the API take a Relation rather than an Oid, then
that would be preferable.

>
> > Next is the existing validation functions stats_check_required_arg(),
> stats_check_arg_array(), and stats_check_arg_pairs() all work on values
> indexed by the positional functioncallinfo and the corresponding
> relstatsinfo/attstatsinfo structure, and this makes a lot of type checking
> and value checking compact and uniform. If we want to keep this sort of
> uniformity, the resulting StatsData structure will end up looking a lot
> like the FunctionCallInfo that we already had.
>
> Yeah, this is worth thinking about. You could consider putting an
> array inside the struct and use #defines for the indexes. And instead
> of having a separate Boolean for each index, you could use the same
> index to reference the N'th bit of a single integer flag variable.

That's basically what I was trying, though I wasn't brave enough to use
bitflags in a v1 patch.

The #defines already exist now in the form of enums that act as indexes
into the FunctionCallInfo array. But if my data structure is an array of
Datums with an array of is-not-nulls, then I'm just not that far away from
the FunctionCallInfo structure we already have.

>
> > Next is the fact that the end-destination for every value passed in is a
> Datum for a pg_statistic heaptuple. Most Datum values are checked only for
> their null-ness and if they're the correct type, so the value itself is
> usually just passed directly from fcinfo into the heap tuple values[]
> array. The float[] values are checked for number of elements and whether
> any elements are NULL, but that is done via array functions that take a
> Datum input. Only in a few cases do we actually look at the actual internal
> value of the Datum (reltuples, attname, attnum, the anyarrays), so there's
> little to gain there.
>
> Right, so the question is whether it makes more sense to pass down C
> strings or Datums.
>

Requiring Datums simplifies things greatly inside the existing functions,
but pushes that complexity to the caller. My first proposal was to keep
complexity to an absolute minimum on the caller side, but your comments
make me think there's considerable tolerance for more complexity on the
caller side.

> > There's some additional hassle in the fact that
> pg_restore_attribute_stats() can take an attnum parameter OR an attname
> parameter, but not both. I was able to resolve that in a semi-elegant
> fashion, but the other issues have convinced me that we're probably better
> off continuing to use the FunctionCallInfo version of
> attribute_stats_update(), though perhaps with a different name, allowing us
> to use that name for the new API call instead of
> import_attribute_statistics().
>
> On that particular point, I think I'm still unconvinced, but I also
> haven't looked deeply into this just yet, so maybe I'm wrong.
>

The issue becomes moot if we require the caller to construct their own
isnull and values arrays as was suggested above.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Diego 2026-06-23 15:57:03 libpq: decouple the .pgpass lookup port from the connection port
Previous Message Arne Roland 2026-06-23 15:53:01 Re: Key joins