Re: scalability bottlenecks with (many) partitions (and more)

From: Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: scalability bottlenecks with (many) partitions (and more)
Date: 2024-01-29 08:53:23
Message-ID: 4541483.LvFx2qVVIh@aivenlaptop
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Le dimanche 28 janvier 2024, 22:57:02 CET Tomas Vondra a écrit :

Hi Tomas !

I'll comment on glibc-malloc part as I studied that part last year, and
proposed some things here: https://www.postgresql.org/message-id/
3424675.QJadu78ljV%40aivenlaptop

> FWIW where does the malloc overhead come from? For one, while we do have
> some caching of malloc-ed memory in memory contexts, that doesn't quite
> work cross-query, because we destroy the contexts at the end of the
> query. We attempt to cache the memory contexts too, but in this case
> that can't help because the allocations come from btbeginscan() where we
> do this:
>
> so = (BTScanOpaque) palloc(sizeof(BTScanOpaqueData));
>
> and BTScanOpaqueData is ~27kB, which means it's an oversized chunk and
> thus always allocated using a separate malloc() call. Maybe we could
> break it into smaller/cacheable parts, but I haven't tried, and I doubt
> > > > it's the only such allocation.

Did you try running an strace on the process ? That may give you some
hindsights into what malloc is doing. A more sophisticated approach would be
using stap and plugging it into the malloc probes, for example
memory_sbrk_more and memory_sbrk_less.

An important part of glibc's malloc behaviour in that regard comes from the
adjustment of the mmap and free threshold. By default, mmap adjusts them
dynamically and you can poke into that using the
memory_mallopt_free_dyn_thresholds probe.

>
> FWIW I was wondering if this is a glibc-specific malloc bottleneck, so I
> tried running the benchmarks with LD_PRELOAD=jemalloc, and that improves
> the behavior a lot - it gets us maybe ~80% of the mempool benefits.
> Which is nice, it confirms it's glibc-specific (I wonder if there's a
> way to tweak glibc to address this), and it also means systems using
> jemalloc (e.g. FreeBSD, right?) don't have this problem. But it also
> says the mempool has ~20% benefit on top of jemalloc.

GLIBC's malloc offers some tuning for this. In particular, setting either
M_MMAP_THRESHOLD or M_TRIM_THRESHOLD will disable the unpredictable "auto
adjustment" beheviour and allow you to control what it's doing.

By setting a bigger M_TRIM_THRESHOLD, one can make sure memory allocated using
sbrk isn't freed as easily, and you don't run into a pattern of moving the
sbrk pointer up and down repeatedly. The automatic trade off between the mmap
and trim thresholds is supposed to prevent that, but the way it is incremented
means you can end in a bad place depending on your particular allocation
patttern.

Best regards,

--
Ronan Dunklau

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2024-01-29 09:05:52 Re: Synchronizing slots from primary to standby
Previous Message Bertrand Drouvot 2024-01-29 08:52:14 Re: Synchronizing slots from primary to standby