Re: scalability bottlenecks with (many) partitions (and more)

From: Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: scalability bottlenecks with (many) partitions (and more)
Date: 2024-01-29 14:15:25
Message-ID: 13440175.uLZWGnKmhe@aivenlaptop
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Le lundi 29 janvier 2024, 13:17:07 CET Tomas Vondra a écrit :
> > Did you try running an strace on the process ? That may give you some
> > hindsights into what malloc is doing. A more sophisticated approach would
> > be using stap and plugging it into the malloc probes, for example
> > memory_sbrk_more and memory_sbrk_less.
>
> No, I haven't tried that. In my experience strace is pretty expensive,
> and if the issue is in glibc itself (before it does the syscalls),
> strace won't really tell us much. Not sure, ofc.

It would tell you how malloc actually performs your allocations, and how often
they end up translated into syscalls. The main issue with glibc would be that
it releases the memory too agressively to the OS, IMO.

>
> > An important part of glibc's malloc behaviour in that regard comes from
> > the
> > adjustment of the mmap and free threshold. By default, mmap adjusts them
> > dynamically and you can poke into that using the
> > memory_mallopt_free_dyn_thresholds probe.
>
> Thanks, I'll take a look at that.
>
> >> FWIW I was wondering if this is a glibc-specific malloc bottleneck, so I
> >> tried running the benchmarks with LD_PRELOAD=jemalloc, and that improves
> >> the behavior a lot - it gets us maybe ~80% of the mempool benefits.
> >> Which is nice, it confirms it's glibc-specific (I wonder if there's a
> >> way to tweak glibc to address this), and it also means systems using
> >> jemalloc (e.g. FreeBSD, right?) don't have this problem. But it also
> >> says the mempool has ~20% benefit on top of jemalloc.
> >
> > GLIBC's malloc offers some tuning for this. In particular, setting either
> > M_MMAP_THRESHOLD or M_TRIM_THRESHOLD will disable the unpredictable "auto
> > adjustment" beheviour and allow you to control what it's doing.
> >
> > By setting a bigger M_TRIM_THRESHOLD, one can make sure memory allocated
> > using sbrk isn't freed as easily, and you don't run into a pattern of
> > moving the sbrk pointer up and down repeatedly. The automatic trade off
> > between the mmap and trim thresholds is supposed to prevent that, but the
> > way it is incremented means you can end in a bad place depending on your
> > particular allocation patttern.
>
> So, what values would you recommend for these parameters?
>
> My concern is increasing those value would lead to (much) higher memory
> usage, with little control over it. With the mempool we keep more
> blocks, ofc, but we have control over freeing the memory.

Right now depending on your workload (especially if you use connection
pooling) you can end up with something like 32 or 64MB of dynamically adjusted
trim-threshold which will never be released back.

The first heurstic I had in mind was to set it to work_mem, up to a
"reasonable" limit I guess. One can argue that it is expected for a backend to
use work_mem frequently, and as such it shouldn't be released back. By setting
work_mem to a lower value, we could ask glibc at the same time to trim the
excess kept memory. That could be useful when a long-lived connection is
pooled, and sees a spike in memory usage only once. Currently that could well
end up with 32MB "wasted" permanently but tuning it ourselves could allow us
to releaase it back.

Since it was last year I worked on this, I'm a bit fuzzy on the details but I
hope this helps.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2024-01-29 14:56:12 Re: PG versus libxml2 2.12.x
Previous Message Daniel Gustafsson 2024-01-29 14:09:57 Re: Wrong buffer limits check