Re: Add the ability to limit the amount of memory that can be allocated to backends.

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: reid(dot)thompson(at)crunchydata(dot)com, Arne Roland <A(dot)Roland(at)index(dot)de>, Andres Freund <andres(at)anarazel(dot)de>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, "stephen(dot)frost" <stephen(dot)frost(at)crunchydata(dot)com>
Subject: Re: Add the ability to limit the amount of memory that can be allocated to backends.
Date: 2023-12-26 21:52:06
Message-ID: 4fb99fb7-8a6a-2828-dd77-e2f1d75c7dd0@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I wanted to take a look at the patch, and I noticed it's broken since
3d51cb5197 renamed a couple pgstat functions in August. I plan to maybe
do some benchmarks etc. preferably on current master, so here's a
version fixing that minor bitrot.

As for the patch, I only skimmed through the thread so far, to get some
idea of what the approach and goals are, etc. I didn't look at the code
yet, so can't comment on that.

However, at pgconf.eu a couple week ago I had quite a few discussions
about such "backend memory limit" could/should work in principle, and
I've been thinking about ways to implement this. So let me share some
thoughts about how this patch aligns with that ...

(FWIW it's not my intent to hijack or derail this patch in any way, but
there's a couple things I think we should do differently.)

I'm 100% on board with having a memory limit "above" work_mem. It's
really annoying that we have no way to restrict the amount of memory a
backend can allocate for complex queries, etc.

But I find it a bit strange that we aim to introduce a "global" memory
limit for all backends combined first. I'm not against having that too,
but it's not the feature I usually wish to have. I need some protection
against runaway backends, that happen to allocate a lot memory.

Similarly, I'd like to be able to have different limits depending on
what the backend does - a backend doing OLAP may naturally need more
memory, while a backend doing OLTP may have a much tighter limit.

But with a single global limit none of this is possible. It may help
reducing the risk of unexpected OOM issues (not 100%, but useful), but
it can't limit the impact to the one backend - if memory starts runnning
out, it will affect all other backends a bit randomly (depending on the
order in which the backends happen to allocate memory). And it does not
consider what workloads the backends execute.

Let me propose a slightly different architecture that I imagined while
thinking about this. It's not radically differrent from what the patch
does, but it focuses on the local accounting first. I believe it's
possible to extend this to enforce the global limit too.

FWIW I haven't tried implementing this - I don't want to "hijack" this
thread and do my own thing. I can take a stab at a PoC if needed.

Firstly, I'm not quite happy with how all the memory contexts have to
do their own version of the accounting and memory checking. I think we
should move that into a new abstraction which I call "memory pool".
It's very close to "memory context" but it only deals with allocating
blocks, not the chunks requested by palloc() etc. So when someone does
palloc(), that may be AllocSetAlloc(). And instead of doing malloc()
that would do MemoryPoolAlloc(blksize), and then that would do all the
accounting and checks, and then do malloc().

This may sound like an unnecessary indirection, but the idea is that a
single memory pool would back many memory contexts (perhaps all for
a given backend). In principle we might even establish separate memory
pools for different parts of the memory context hierarchy, but I'm not
sure we need that.

I can imagine the pool could also cache blocks for cases when we create
and destroy contexts very often, but glibc should already does that for
us, I believe.

For me, the accounting and memory context is the primary goal. I wonder
if we considered this context/pool split while working on the accounting
for hash aggregate, but I think we were too attached to doing all of it
in the memory context hierarchy.

Of course, this memory pool is per backend, and so would be the memory
accounting and limit enforced by it. But I can imagine extending to do
a global limit similar to what the current patch does - using a counter
in shared memory, or something. I haven't reviewed what's the overhead
or how it handles cases when a backend terminates in some unexpected
way. But whatever the current patch does, memory pool could do too.

Secondly, I think there's an annoying issue with the accounting at the
block level - it makes it problematic to use low limit values. We double
the block size, so we may quickly end up with a block size a couple MBs,
which means the accounting granularity gets very coarse.

I think it'd be useful to introduce a "backpressure" between the memory
pool and the memory context, depending on how close we are to the limit.
For example if the limit is 128MB and the backend allocated 16MB so far,
we're pretty far away from the limit. So if the backend requests 8MB
block, that's fine and the memory pool should malloc() that. But if we
already allocated 100MB, maybe we should be careful and not allow 8MB
blocks - the memory pool should be allowed to override this and return
just 1MB block. Sure, this would have to be optional, and not all places
can accept a smaller block than requested (when the chunk would not fit
into the smaller block). It would require a suitable memory pool API
and more work in the memory contexts, but it seems pretty useful.
Certainly not something for v1.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
v20231226-0001-Add-tracking-of-backend-memory-allocated.patch text/x-patch 49.6 KB
v20231226-0002-fixup-pgstat_get_local_beentry_by_index.patch text/x-patch 921 bytes
v20231226-0003-Add-the-ability-to-limit-the-amount-of-mem.patch text/x-patch 37.0 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2023-12-26 22:07:09 Re: Fix Brin Private Spool Initialization (src/backend/access/brin/brin.c)
Previous Message Vik Fearing 2023-12-26 21:45:31 Re: A tiny improvement of psql