Re: Optimize planner memory consumption for huge arrays

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Lepikhov Andrei <a(dot)lepikhov(at)postgrespro(dot)ru>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Евгений Бредня <e(dot)brednya(at)postgrespro(dot)ru>
Subject: Re: Optimize planner memory consumption for huge arrays
Date: 2024-02-24 23:07:39
Message-ID: 1367418.1708816059@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> writes:
>> On 2/19/24 16:45, Tom Lane wrote:
>>> Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> writes:
>>>> For example, I don't think we expect selectivity functions to allocate
>>>> long-lived objects, right? So maybe we could run them in a dedicated
>>>> memory context, and reset it aggressively (after each call).

>>> That could eliminate a whole lot of potential leaks. I'm not sure
>>> though how much it moves the needle in terms of overall planner
>>> memory consumption.

>> It was an ad hoc thought, inspired by the issue at hand. Maybe it would
>> be possible to find similar "boundaries" in other parts of the planner.

> Here's a quick and probably-incomplete implementation of that idea.
> I've not tried to study its effects on memory consumption, just made
> sure it passes check-world.

I spent a bit more time on this patch. One thing I was concerned
about was whether it causes any noticeable slowdown, and it seems that
it does: testing with "pgbench -S" I observe perhaps 1% slowdown.
However, we don't necessarily need to reset the temp context after
every single usage. I experimented with resetting it every tenth
time, and that got me from 1% slower than HEAD to 1% faster. Of
course "every tenth time" is very ad hoc. I wondered if we could
make it somehow conditional on how much memory had been consumed
in the temp context, but there doesn't seem to be any cheap way
to get that. Applying something like MemoryContextMemConsumed
would surely be a loser. I'm not sure if it'd be worth extending
the mcxt.c API to provide something like "MemoryContextResetIfBig",
with some internal rule that would be cheap to apply like "reset
if we have any non-keeper blocks".

I also looked into whether it really does reduce overall memory
consumption noticeably, by collecting stats about planner memory
consumption during the core regression tests. The answer is that
it barely helps. I see the average used space across all planner
invocations drop from 23344 bytes to 23220, and the worst-case
numbers hardly move at all. So that's a little discouraging.
But of course the regression tests prefer not to deal in very
large/expensive test cases, so maybe it's not surprising that
I don't see much win in this test.

Anyway, 0001 attached is a cleaned-up patch with the every-tenth-
time rule, and 0002 (not meant for commit) is the quick and
dirty instrumentation patch I used for collecting usage stats.

Even though this seems of only edge-case value, I'd much prefer
to do this than the sort of ad-hoc patching initially proposed
in this thread.

regards, tom lane

Attachment Content-Type Size
v2-0001-estimate-selectivities-in-temp-context.patch text/x-diff 7.4 KB
v2-0002-ad-hoc-usage-stats-collection.patch text/x-diff 1.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Jones 2024-02-25 00:11:42 Re: Patch: Add parse_type Function
Previous Message Thomas Munro 2024-02-24 22:37:50 Re: Relation bulk write facility