Re: Optimizing ResouceOwner to speed up COPY

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>
Subject: Re: Optimizing ResouceOwner to speed up COPY
Date: 2025-10-17 22:49:56
Message-ID: d0c25286-2fed-4e85-9ec5-981b5f7ffadd@vondra.me
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/17/25 12:32, Tomas Vondra wrote:
>
>
> On 10/17/25 10:31, Heikki Linnakangas wrote:
>> On 17/10/2025 06:13, Chao Li wrote:
>>> ```
>>> @@ -250,12 +257,21 @@ ResourceOwnerAddToHash(ResourceOwner owner,
>>> Datum value, const ResourceOwnerDesc
>>>       idx = hash_resource_elem(value, kind) & mask;
>>>       for (;;)
>>>       {
>>> +        /* found an exact match - just increment the counter */
>>> +        if ((owner->hash[idx].kind == kind) &&
>>> +            (owner->hash[idx].item == value))
>>> +        {
>>> +            owner->hash[idx].count += count;
>>> +            return;
>>> +        }
>>> +
>>>           if (owner->hash[idx].kind == NULL)
>>>               break;                /* found a free slot */
>>>           idx = (idx + 1) & mask;
>>>       }
>>> ```
>>>
>>> I think this is the “trade-off” you mention in your email: if a free
>>> slot found earlier, then still gets duplicated entries. I have no
>>> concern to this “trade-off”, but please add a comment here, otherwise
>>> future readers may be confused, and might potentially consider that
>>> were a bug, without reading your original design email.
>>
>> +1 on such a comment. Here or maybe on the ResourceElem struct itself.
>>
>
> Will do. It's probably appropriate to mention this deduplication in the
> comment at the beginning of the file.
>

I added a comment to the file comment, and clarified the comment when
adding the hash table entry. But I tried not to be too verbose there.

>>>  typedef struct ResourceElem
>>>  {
>>>      Datum        item;
>>> +    uint32        count;                /* number of occurrences */
>>>      const ResourceOwnerDesc *kind;    /* NULL indicates a free hash
>>> table slot */
>>>  } ResourceElem;
>>
>> Hmm, the 'count' is not used when the entry is stored in the array.
>> Perhaps we should have a separate struct for array and hash elements
>> now. Keeping the array small helps it to fit in CPU caches.
>>
>
> Agreed. I had the same idea yesterday, but I haven't done it yet.
>

The attached v2 does that - it adds a separate ResourceHashElem for the
has table, and it works. But I'm not sure I like it very much, because
there are two places that relied on both the array and hash table using
the same struct to "walk" it the same way.

For ResourceOwnerSort() it's not too bad, but ResourceOwnerReleaseAll()
now duplicates most of the code. It's not terrible, but also not pretty.
I can't think of a better way, though.

>>> /*
>>>  * Forget that an object is owned by a ResourceOwner
>>>  *
>>>  * Note: If same resource ID is associated with the ResourceOwner more
>>> than
>>>  * once, one instance is removed.
>>>  *
>>>  * Note: Forgetting a resource does not guarantee that there is room to
>>>  * remember a new resource.  One exception is when you forget the most
>>>  * recently remembered resource; that does make room for a new
>>> remember call.
>>>  * Some code callers rely on that exception.
>>>  */
>>> void
>>> ResourceOwnerForget(ResourceOwner owner, Datum value, const
>>> ResourceOwnerDesc *kind)
>>
>> Does this break the exception noted in the above comment? I guess it
>> still holds because we don't deduplicate entries in the array. That's
>> very subtle, needs a comment at least.
>>
>
> Right, it doesn't break the exception - as you say, it only applies to
> the hash, not to the array. I'll add a comment.
>

Comment added.

regards

--
Tomas Vondra

Attachment Content-Type Size
v2-0001-Deduplicate-entries-in-ResourceOwner.patch text/x-patch 13.2 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David E. Wheeler 2025-10-17 22:51:32 Re: abi-compliance-check failure due to recent changes to pg_{clear,restore}_{attribute,relation}_stats()
Previous Message Jeff Davis 2025-10-17 22:23:01 Re: new environment variable INITDB_LOCALE_PROVIDER