From: | Tomas Vondra <tomas(at)vondra(dot)me> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Optimizing ResouceOwner to speed up COPY |
Date: | 2025-10-16 21:58:45 |
Message-ID: | 356ba744-05bf-4847-bfcf-ec7d5f09f137@vondra.me |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 10/16/25 21:28, Tom Lane wrote:
> Tomas Vondra <tomas(at)vondra(dot)me> writes:
>> On 10/16/25 20:12, Tom Lane wrote:
>>> Can you find evidence of this change being helpful for anything
>>> except this specific scenario in COPY?
>
>> I went through the ResourceOwnerRemember() calls, looking for other
>> cases that might create a lot of duplicates, similar to the tuple
>> descriptors, but I haven't found anything obvious. Other resources seem
>> to be either naturally unique or limited to very few duplicates.
>
> I was thinking of adding some temporary instrumentation, like
> just elog'ing whenever the count goes above 1, and seeing where
> you get hits during the regression tests. I'm prepared to believe
> this is worth doing, but it'd be nice to have more examples
> in mind.
>
I tried that, and that gives me ~30k log messages with (count > 1). But
that's a bit misleading, because a lot of that are the same "thing"
going from 1 to N, which produces N messages.
If I subtract all the COPY statements, loading data for regressison
tests, that leaves ~7500 cases. There's a lot of cases with count 2 or
3, mostly simple queries. Even a simple "\d t" produces a bunch of such
messages.
test=# \d t
WARNING: RESOURCEOWNER: snapshot reference 0x2e3787b0 resource owner
Portal count 2
WARNING: RESOURCEOWNER: relcache reference 0x79ae1302fba8 resource
owner Portal count 2
WARNING: RESOURCEOWNER: tupdesc reference 0x79ae1302fec8 resource owner
Portal count 2
WARNING: RESOURCEOWNER: relcache reference 0x79ae13034928 resource
owner Portal count 2
WARNING: RESOURCEOWNER: relcache reference 0x79ae13034928 resource
owner Portal count 3
WARNING: RESOURCEOWNER: relcache reference 0x79ae13034d88 resource
owner Portal count 2
WARNING: RESOURCEOWNER: buffer pin 0x4a resource owner Portal count 2
WARNING: RESOURCEOWNER: relcache reference 0x79ae13034928 resource
owner Portal count 4
WARNING: RESOURCEOWNER: buffer pin 0xa resource owner Portal count 2
WARNING: RESOURCEOWNER: relcache reference 0x79ae12dca6d0 resource
owner Portal count 2
WARNING: RESOURCEOWNER: relcache reference 0x79ae1303aff8 resource
owner Portal count 2
There are some more extreme ones too. For example
select infinite_recurse();
produces
WARNING: RESOURCEOWNER: plancache reference 0x34555828 resource owner
Portal count 1340
Another example is CREATE TABLE, which creates a batch of slots when
inserting attributes in InsertPgAttributeTuples, so that'll end up with
the count = number of attributes.
Of course, those are not particularly frequent operations. Most
applications are not doing CREATE TABLE nearly as often as DML.
But I had another idea - see how large the ResourceOwners get, which
would tell us how much "overhead" it really is. So I added logging into
ResourceOwnerDelete (without the patch), and with that regression tests
produce 113916 messages. And 113289 have the initial capacity 32, so
array only. From the remaining ~600, only 72 have capacity over 64.
So I guess the overhead should not be that bad. Actually, it would be
possible to completely eliminate the overhead for the array, because
that does not actually need the count at all.
regards
--
Tomas Vondra
From | Date | Subject | |
---|---|---|---|
Next Message | Masahiko Sawada | 2025-10-16 22:01:22 | Re: speedup COPY TO for partitioned table. |
Previous Message | Sami Imseih | 2025-10-16 21:04:41 | Re: [BUG] temporary file usage report with extended protocol and unnamed portals |