Re: Optimizing ResouceOwner to speed up COPY

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>
Subject: Re: Optimizing ResouceOwner to speed up COPY
Date: 2025-10-17 10:32:43
Message-ID: c8ade815-127b-4a78-a481-a3f47ebbeec0@vondra.me
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/17/25 10:31, Heikki Linnakangas wrote:
> On 17/10/2025 06:13, Chao Li wrote:
>> ```
>> @@ -250,12 +257,21 @@ ResourceOwnerAddToHash(ResourceOwner owner,
>> Datum value, const ResourceOwnerDesc
>>       idx = hash_resource_elem(value, kind) & mask;
>>       for (;;)
>>       {
>> +        /* found an exact match - just increment the counter */
>> +        if ((owner->hash[idx].kind == kind) &&
>> +            (owner->hash[idx].item == value))
>> +        {
>> +            owner->hash[idx].count += count;
>> +            return;
>> +        }
>> +
>>           if (owner->hash[idx].kind == NULL)
>>               break;                /* found a free slot */
>>           idx = (idx + 1) & mask;
>>       }
>> ```
>>
>> I think this is the “trade-off” you mention in your email: if a free
>> slot found earlier, then still gets duplicated entries. I have no
>> concern to this “trade-off”, but please add a comment here, otherwise
>> future readers may be confused, and might potentially consider that
>> were a bug, without reading your original design email.
>
> +1 on such a comment. Here or maybe on the ResourceElem struct itself.
>

Will do. It's probably appropriate to mention this deduplication in the
comment at the beginning of the file.

>>  typedef struct ResourceElem
>>  {
>>      Datum        item;
>> +    uint32        count;                /* number of occurrences */
>>      const ResourceOwnerDesc *kind;    /* NULL indicates a free hash
>> table slot */
>>  } ResourceElem;
>
> Hmm, the 'count' is not used when the entry is stored in the array.
> Perhaps we should have a separate struct for array and hash elements
> now. Keeping the array small helps it to fit in CPU caches.
>

Agreed. I had the same idea yesterday, but I haven't done it yet.

>> /*
>>  * Forget that an object is owned by a ResourceOwner
>>  *
>>  * Note: If same resource ID is associated with the ResourceOwner more
>> than
>>  * once, one instance is removed.
>>  *
>>  * Note: Forgetting a resource does not guarantee that there is room to
>>  * remember a new resource.  One exception is when you forget the most
>>  * recently remembered resource; that does make room for a new
>> remember call.
>>  * Some code callers rely on that exception.
>>  */
>> void
>> ResourceOwnerForget(ResourceOwner owner, Datum value, const
>> ResourceOwnerDesc *kind)
>
> Does this break the exception noted in the above comment? I guess it
> still holds because we don't deduplicate entries in the array. That's
> very subtle, needs a comment at least.
>

Right, it doesn't break the exception - as you say, it only applies to
the hash, not to the array. I'll add a comment.

Thanks

--
Tomas Vondra

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dmitry Dolgov 2025-10-17 10:33:31 Re: Changing shared_buffers without restart
Previous Message Fujii Masao 2025-10-17 10:32:35 Re: pg_restore --no-policies should not restore policies' comment