Re: Optimizing ResouceOwner to speed up COPY

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Optimizing ResouceOwner to speed up COPY
Date: 2025-10-16 18:12:47
Message-ID: 1534176.1760638367@sss.pgh.pa.us
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tomas Vondra <tomas(at)vondra(dot)me> writes:
> The reason is pretty simple - ResourceOwner tracks the resources in a
> very simple hash table, with O(n^2) behavior with duplicates. This
> happens with COPY, because COPY creates an array of a 1000 tuple slots,
> and each slot references the same tuple descriptor. And the descriptor
> is added to ResourceOwner for each slot.
> ...
> There's an easy way to improve this by allowing a single hash entry to
> represent multiple references to the same resource. The attached patch
> adds a "count" to the ResourceElem, tracking how many times that
> resource was added. So if you add 1000 tuples slots, the descriptor will
> have just one ResourceElem entry with count=1000.

Hmm. I don't love the 50% increase in sizeof(ResourceElem) ... maybe
that's negligible, or maybe it isn't. Can you find evidence of this
change being helpful for anything except this specific scenario in
COPY? Because we could probably find some way to avoid registering
all the doppelganger slots, if that's the only culprit.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joel Jacobson 2025-10-16 18:16:25 Re: Optimize LISTEN/NOTIFY
Previous Message Tomas Vondra 2025-10-16 17:46:49 Optimizing ResouceOwner to speed up COPY