From: | Tomas Vondra <tomas(at)vondra(dot)me> |
---|---|
To: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
Subject: | Re: Optimizing ResouceOwner to speed up COPY |
Date: | 2025-10-17 10:32:43 |
Message-ID: | c8ade815-127b-4a78-a481-a3f47ebbeec0@vondra.me |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 10/17/25 10:31, Heikki Linnakangas wrote:
> On 17/10/2025 06:13, Chao Li wrote:
>> ```
>> @@ -250,12 +257,21 @@ ResourceOwnerAddToHash(ResourceOwner owner,
>> Datum value, const ResourceOwnerDesc
>> idx = hash_resource_elem(value, kind) & mask;
>> for (;;)
>> {
>> + /* found an exact match - just increment the counter */
>> + if ((owner->hash[idx].kind == kind) &&
>> + (owner->hash[idx].item == value))
>> + {
>> + owner->hash[idx].count += count;
>> + return;
>> + }
>> +
>> if (owner->hash[idx].kind == NULL)
>> break; /* found a free slot */
>> idx = (idx + 1) & mask;
>> }
>> ```
>>
>> I think this is the “trade-off” you mention in your email: if a free
>> slot found earlier, then still gets duplicated entries. I have no
>> concern to this “trade-off”, but please add a comment here, otherwise
>> future readers may be confused, and might potentially consider that
>> were a bug, without reading your original design email.
>
> +1 on such a comment. Here or maybe on the ResourceElem struct itself.
>
Will do. It's probably appropriate to mention this deduplication in the
comment at the beginning of the file.
>> typedef struct ResourceElem
>> {
>> Datum item;
>> + uint32 count; /* number of occurrences */
>> const ResourceOwnerDesc *kind; /* NULL indicates a free hash
>> table slot */
>> } ResourceElem;
>
> Hmm, the 'count' is not used when the entry is stored in the array.
> Perhaps we should have a separate struct for array and hash elements
> now. Keeping the array small helps it to fit in CPU caches.
>
Agreed. I had the same idea yesterday, but I haven't done it yet.
>> /*
>> * Forget that an object is owned by a ResourceOwner
>> *
>> * Note: If same resource ID is associated with the ResourceOwner more
>> than
>> * once, one instance is removed.
>> *
>> * Note: Forgetting a resource does not guarantee that there is room to
>> * remember a new resource. One exception is when you forget the most
>> * recently remembered resource; that does make room for a new
>> remember call.
>> * Some code callers rely on that exception.
>> */
>> void
>> ResourceOwnerForget(ResourceOwner owner, Datum value, const
>> ResourceOwnerDesc *kind)
>
> Does this break the exception noted in the above comment? I guess it
> still holds because we don't deduplicate entries in the array. That's
> very subtle, needs a comment at least.
>
Right, it doesn't break the exception - as you say, it only applies to
the hash, not to the array. I'll add a comment.
Thanks
--
Tomas Vondra
From | Date | Subject | |
---|---|---|---|
Next Message | Dmitry Dolgov | 2025-10-17 10:33:31 | Re: Changing shared_buffers without restart |
Previous Message | Fujii Masao | 2025-10-17 10:32:35 | Re: pg_restore --no-policies should not restore policies' comment |