Re: ResourceOwner refactoring

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Aleksander Alekseev <aleksander(at)timescale(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: ResourceOwner refactoring
Date: 2023-10-25 12:43:36
Message-ID: 20607b96-b275-4a57-8509-29a3ed320d3f@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/07/2023 22:14, Andres Freund wrote:
> On 2023-05-25 20:14:23 +0300, Heikki Linnakangas wrote:
>> @@ -4183,9 +4180,6 @@ FlushRelationsAllBuffers(SMgrRelation *smgrs, int nrels)
>> if (use_bsearch)
>> pg_qsort(srels, nrels, sizeof(SMgrSortArray), rlocator_comparator);
>>
>> - /* Make sure we can handle the pin inside the loop */
>> - ResourceOwnerEnlargeBuffers(CurrentResourceOwner);
>> -
>> for (i = 0; i < NBuffers; i++)
>> {
>> SMgrSortArray *srelent = NULL;
>
> Hm, a tad worried that increasing the number of ResourceOwnerEnlargeBuffers()
> that drastically could have a visible overhead.

You mean in FlushRelationsAllBuffers() in particular? Seems pretty cheap
compared to all the other work involved, and it's not that performance
critical anyway.

ExtendBufferedRelShared() gave me a pause, as it is in the critical
path. But there too, the ResourceOwnerEnlarge() call pales in comparison
to all the other work that it does for each buffer.

Other than that, I don't think I'm introducing new
ResourceOwnerEnlarge() calls to hot codepaths, just moving them around.

>> +static ResourceOwnerFuncs tupdesc_resowner_funcs =
>> +{
>> + .name = "tupdesc reference",
>> + .release_phase = RESOURCE_RELEASE_AFTER_LOCKS,
>> + .release_priority = RELEASE_PRIO_TUPDESC_REFS,
>> + .ReleaseResource = ResOwnerReleaseTupleDesc,
>> + .PrintLeakWarning = ResOwnerPrintTupleDescLeakWarning
>> +};
>
> I think these should all be marked const, there's no need to have them be in a
> mutable section of the binary. Of course that will require adjusting a bunch
> of the signatures too, but that seems fine.

Done.

>> --- a/src/backend/access/transam/xact.c
>> +++ b/src/backend/access/transam/xact.c
>> @@ -5171,9 +5171,24 @@ AbortSubTransaction(void)
>> ResourceOwnerRelease(s->curTransactionOwner,
>> RESOURCE_RELEASE_BEFORE_LOCKS,
>> false, false);
>> +
>> AtEOSubXact_RelationCache(false, s->subTransactionId,
>> s->parent->subTransactionId);
>> +
>> +
>> + /*
>> + * AtEOSubXact_Inval sometimes needs to temporarily
>> + * bump the refcount on the relcache entries that it processes.
>> + * We cannot use the subtransaction's resource owner anymore,
>> + * because we've already started releasing it. But we can use
>> + * the parent resource owner.
>> + */
>> + CurrentResourceOwner = s->parent->curTransactionOwner;
>> +
>> AtEOSubXact_Inval(false);
>> +
>> + CurrentResourceOwner = s->curTransactionOwner;
>> +
>> ResourceOwnerRelease(s->curTransactionOwner,
>> RESOURCE_RELEASE_LOCKS,
>> false, false);
>
> I might be a bit slow this morning, but why did this need to change as part of
> this?

I tried to explain it in the comment. AtEOSubXact_Inval() sometimes
needs to bump the refcount on a relcache entry, which is tracked in the
current resource owner. But we have already started to release it. On
master, you can allocate new resources in a ResourceOwner after you have
already called ResourceOwnerRelease(RESOURCE_RELEASE_BEFORE_LOCKS), but
with this patch, that's no longer allowed and you get an assertion
failure. I think it was always a bit questionable; it's not clear that
the resource would've been released correctly if an error occurred. In
practice, I don't think an error could actually occur while
AtEOXSubXact_Inval() briefly holds the relcache entry.

>> @@ -5182,8 +5221,8 @@ TerminateBufferIO(BufferDesc *buf, bool clear_dirty, uint32 set_flag_bits)
>> * If I/O was in progress, we always set BM_IO_ERROR, even though it's
>> * possible the error condition wasn't related to the I/O.
>> */
>> -void
>> -AbortBufferIO(Buffer buffer)
>> +static void
>> +AbortBufferIO(Buffer buffer, bool forget_owner)
>
> What is the forget_owner argument for? It's always false, afaics?

Removed. There used to be more callers of AbortBufferIO() which needed
that, but not anymore.

>> +/* Convenience wrappers over ResourceOwnerRemember/Forget */
>> +#define ResourceOwnerRememberCatCacheRef(owner, tuple) \
>> + ResourceOwnerRemember(owner, PointerGetDatum(tuple), &catcache_resowner_funcs)
>> +#define ResourceOwnerForgetCatCacheRef(owner, tuple) \
>> + ResourceOwnerForget(owner, PointerGetDatum(tuple), &catcache_resowner_funcs)
>> +#define ResourceOwnerRememberCatCacheListRef(owner, list) \
>> + ResourceOwnerRemember(owner, PointerGetDatum(list), &catlistref_resowner_funcs)
>> +#define ResourceOwnerForgetCatCacheListRef(owner, list) \
>> + ResourceOwnerForget(owner, PointerGetDatum(list), &catlistref_resowner_funcs)
>
> Not specific to this type of resource owner, but I wonder if it'd not be
> better to use inline functions for these - the PointerGetDatum() cast removes
> nearly all type safety.

Peter just said the same thing. I guess you're right, changed.

>> +Releasing
>> +---------
>> +
>> +Releasing the resources of a ResourceOwner happens in three phases:
>> +
>> +1. "Before-locks" resources
>> +
>> +2. Locks
>> +
>> +3. "After-locks" resources
>
> Given that we now have priorities, I wonder if we shouldn't merge the phase and
> the priorities? We have code like this:
>
>
> ResourceOwnerRelease(TopTransactionResourceOwner,
> RESOURCE_RELEASE_BEFORE_LOCKS,
> true, true);
> ...
> ResourceOwnerRelease(TopTransactionResourceOwner,
> RESOURCE_RELEASE_LOCKS,
> true, true);
> ResourceOwnerRelease(TopTransactionResourceOwner,
> RESOURCE_RELEASE_AFTER_LOCKS,
> true, true);
>
> Now, admittedly locks currently are "special" anyway, but we have a plan to
> get rid of that. If we did, then we could do the last two as one as
> ResourceOwnerRelease(from = RELEASE_PRIO_LOCKS, to: RELEASE_PRIO_LAST, ...)
> or such.

Currently, we always release parent's resources before a child's, within
each phase. I'm not sure if we rely on that parent-child ordering
anywhere though. I'd like to leave it as it is for now, to limit the
scope of this, and maybe revisit later.

>> +Normally, you are expected to call ResourceOwnerForget on every resource so
>> +that at commit, the ResourceOwner is empty (locks are an exception). If there
>> +are any resources still held at commit, ResourceOwnerRelease will call the
>> +PrintLeakWarning callback on each such resource. At abort, however, we truly
>> +rely on the ResourceOwner mechanism and it is normal that there are resources
>> +to be released.
>
> I am not sure it's a good idea to encode that all kinds of resources ought to
> have been released prior on commit. I can see that not always making sense -
> in fact we already don't do it for locks, no? Perhaps just add a "commonly"?

Hmm, I'd still like to print the warnings for resources that we expect
to be released before commit, though. We could have a flag in the
ResourceOwnerDesc to give flexibility, but I'm inclined to wait until we
get the first case of someone actually needing it.

>> +typedef struct ResourceElem
>> {
>> ...
>> + Datum item;
>> + ResourceOwnerFuncs *kind; /* NULL indicates a free hash table slot */
>> +} ResourceElem;
>
>> /*
>> - * Initially allocated size of a ResourceArray. Must be power of two since
>> - * we'll use (arraysize - 1) as mask for hashing.
>> + * Size of the fixed-size array to hold most-recently remembered resources.
>> */
>> -#define RESARRAY_INIT_SIZE 16
>> +#define RESOWNER_ARRAY_SIZE 32
>
> That's 512 bytes, pretty large to be searched sequentially. It's somewhat sad
> that the array needs to include 8 byte of ResourceOwnerFuncs...

Yeah. Early on I considered having a RegisterResourceKind() function
that you need to call first, and then each resource kind could be
assigned a smaller ID. If we wanted to keep the old mechanism of
separate arrays for each different resource kind, that'd be the way to
go. But overall this seems simpler, and the performance is decent
despite that linear scan.

Note that the current code in master also uses a plain array, until it
grows to 512 bytes, when it switches to a hash table. Lot of the details
with the array/hash combo are different, but that threshold was the same.

>> + bool releasing; /* has ResourceOwnerRelease been called? */
>> +
>> + /*
>> + * The fixed-size array for recent resources.
>> + *
>> + * If 'releasing' is set, the contents are sorted by release priority.
>> + */
>> + uint32 narr; /* how many items are stored in the array */
>> + ResourceElem arr[RESOWNER_ARRAY_SIZE];
>> +
>> + /*
>> + * The hash table. Uses open-addressing. 'nhash' is the number of items
>> + * present; if it would exceed 'grow_at', we enlarge it and re-hash.
>> + * 'grow_at' should be rather less than 'capacity' so that we don't waste
>> + * too much time searching for empty slots.
>> + *
>> + * If 'releasing' is set, the contents are no longer hashed, but sorted by
>> + * release priority. The first 'nhash' elements are occupied, the rest
>> + * are empty.
>> + */
>> + ResourceElem *hash;
>> + uint32 nhash; /* how many items are stored in the hash */
>> + uint32 capacity; /* allocated length of hash[] */
>> + uint32 grow_at; /* grow hash when reach this */
>> +
>> + /* The local locks cache. */
>> + uint32 nlocks; /* number of owned locks */
>> LOCALLOCK *locks[MAX_RESOWNER_LOCKS]; /* list of owned locks */
>
> Due to 'narr' being before the large 'arr', 'nhash' is always on a separate
> cacheline. I.e. the number of cachelines accessed in happy paths is higher
> than necessary, also relevant because there's more dependencies on narr, nhash
> than on the contents of those.
>
> I'd probably order it so that both of the large elements (arr, locks) are at
> the end.
>
> It's hard to know with changes this small, but it does seem to yield a small
> and repeatable performance benefit (pipelined pgbench -S).

Swapped the 'hash' and 'arr' parts of the struct, so that 'arr' and
'locks' are at the end.

>> +/*
>> + * Forget that an object is owned by a ResourceOwner
>> + *
>> + * Note: if same resource ID is associated with the ResourceOwner more than
>> + * once, one instance is removed.
>> + */
>> +void
>> +ResourceOwnerForget(ResourceOwner owner, Datum value, ResourceOwnerFuncs *kind)
>> +{
>> +#ifdef RESOWNER_TRACE
>> + elog(LOG, "FORGET %d: owner %p value " UINT64_FORMAT ", kind: %s",
>> + resowner_trace_counter++, owner, DatumGetUInt64(value), kind->name);
>> +#endif
>> +
>> + /*
>> + * Mustn't call this after we have already started releasing resources.
>> + * (Release callback functions are not allowed to release additional
>> + * resources.)
>> + */
>> + if (owner->releasing)
>> + elog(ERROR, "ResourceOwnerForget called for %s after release started", kind->name);
>> +
>> + /* Search through all items in the array first. */
>> + for (uint32 i = 0; i < owner->narr; i++)
>> + {
>
> Hm, I think we oftend up releasing resources in a lifo order. Would it
> possibly make sense to search in reverse order?

Changed. I can see a small speedup in a micro benchmark, when the array
is almost full and you repeatedly remember / forget one more resource.

> Separately, I wonder if it'd be worth to check if there are any other matches
> when assertions are enabled.

It is actually allowed to remember the same resource twice. There's a
comment in ResourceOwnerForget (previously in ResourceArrayRemove) that
each call releases one instance in that case. I'm not sure if we rely on
that anywhere currently.

>> ResourceOwnerRelease(ResourceOwner owner,
>> @@ -492,6 +631,15 @@ ResourceOwnerRelease(ResourceOwner owner,
>> {
>> /* There's not currently any setup needed before recursing */
>> ResourceOwnerReleaseInternal(owner, phase, isCommit, isTopLevel);
>> +
>> +#ifdef RESOWNER_STATS
>> + if (isTopLevel)
>> + {
>> + elog(LOG, "RESOWNER STATS: lookups: array %d, hash %d", narray_lookups, nhash_lookups);
>> + narray_lookups = 0;
>> + nhash_lookups = 0;
>> + }
>> +#endif
>> }
>
> Why do we have ResourceOwnerRelease() vs ResourceOwnerReleaseInternal()? Looks
> like that's older than your patch, but still confused.

Dunno. It provides a place to do things on the top-level resource owner
that you don't want to do when you recurse to the children, as hinted by
the comment, but I don't know if we've ever had such things. It was
handy for printing the RESOWNER_STATS now, though :-).

>> + /* Let add-on modules get a chance too */
>> + for (item = ResourceRelease_callbacks; item; item = next)
>> + {
>> + /* allow callbacks to unregister themselves when called */
>> + next = item->next;
>> + item->callback(phase, isCommit, isTopLevel, item->arg);
>> + }
>
> I wonder if it's really a good idea to continue having this API. What you are
> allowed to do in a resowner changed

Hmm, I don't think anything changed for users of this API. I'm not in a
hurry to remove this and force extensions to adapt, as it's not hard to
maintain this.

>> +/*
>> + * ResourceOwnerReleaseAllOfKind
>> + * Release all resources of a certain type held by this owner.
>> + */
>> +void
>> +ResourceOwnerReleaseAllOfKind(ResourceOwner owner, ResourceOwnerFuncs *kind)
>> +{
>> + /* Mustn't call this after we have already started releasing resources. */
>> + if (owner->releasing)
>> + elog(ERROR, "ResourceOwnerForget called for %s after release started", kind->name);
>>
>> - /* Ditto for plancache references */
>> - while (ResourceArrayGetAny(&(owner->planrefarr), &foundres))
>> + /*
>> + * Temporarily set 'releasing', to prevent calls to ResourceOwnerRemember
>> + * while we're scanning the owner. Enlarging the hash would cause us to
>> + * lose track of the point we're scanning.
>> + */
>> + owner->releasing = true;
>
> Is it a problem than a possible error would lead to ->releasing = true to be
> "leaked"? I think it might be, because we haven't called ResourceOwnerSort(),
> which means we'd not process resources in the right order during abort
> processing.

Good point. I added a separate 'sorted' flag to handle that gracefully.

>> +/*
>> + * Hash function for value+kind combination.
>> + */
>> static inline uint32
>> hash_resource_elem(Datum value, ResourceOwnerFuncs *kind)
>> {
>> - Datum data[2];
>> -
>> - data[0] = value;
>> - data[1] = PointerGetDatum(kind);
>> -
>> - return hash_bytes((unsigned char *) &data, 2 * SIZEOF_DATUM);
>> + /*
>> + * Most resource kinds store a pointer in 'value', and pointers are unique
>> + * all on their own. But some resources store plain integers (Files and
>> + * Buffers as of this writing), so we want to incorporate the 'kind' in
>> + * the hash too, otherwise those resources will collide a lot. But
>> + * because there are only a few resource kinds like that - and only a few
>> + * resource kinds to begin with - we don't need to work too hard to mix
>> + * 'kind' into the hash. Just add it with hash_combine(), it perturbs the
>> + * result enough for our purposes.
>> + */
>> +#if SIZEOF_DATUM == 8
>> + return hash_combine64(murmurhash64((uint64) value), (uint64) kind);
>> +#else
>> + return hash_combine(murmurhash32((uint32) value), (uint32) kind);
>> +#endif
>> }
>
> Why are we using a 64bit hash to then just throw out the high bits?

It was just expedient when the inputs are 64-bit. Implementation of
murmurhash64 is tiny, and we already use murmurhash32 elsewhere, so it
seemed like an OK choice. I'm sure there are better algorithms out
there, though.

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachment Content-Type Size
v16-0001-Move-a-few-ResourceOwnerEnlarge-calls-for-safety.patch text/x-patch 9.8 KB
v16-0002-Make-resowners-more-easily-extensible.patch text/x-patch 158.8 KB
v16-0003-Use-a-faster-hash-function-in-resource-owners.patch text/x-patch 2.8 KB
v16-0004-Change-pgcrypto-to-use-the-new-ResourceOwner-mec.patch text/x-patch 7.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Aleksander Alekseev 2023-10-25 12:43:54 Re: RFC: Pluggable TOAST
Previous Message Dilip Kumar 2023-10-25 12:43:27 Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock