Re: scalability bottlenecks with (many) partitions (and more)

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: scalability bottlenecks with (many) partitions (and more)
Date: 2024-01-29 12:17:07
Message-ID: c3cddb9d-283e-4caf-b558-5c9196320650@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/29/24 09:53, Ronan Dunklau wrote:
> Le dimanche 28 janvier 2024, 22:57:02 CET Tomas Vondra a écrit :
>
> Hi Tomas !
>
> I'll comment on glibc-malloc part as I studied that part last year, and
> proposed some things here: https://www.postgresql.org/message-id/
> 3424675.QJadu78ljV%40aivenlaptop
>

Thanks for reminding me. I'll re-read that thread.

>
>> FWIW where does the malloc overhead come from? For one, while we do have
>> some caching of malloc-ed memory in memory contexts, that doesn't quite
>> work cross-query, because we destroy the contexts at the end of the
>> query. We attempt to cache the memory contexts too, but in this case
>> that can't help because the allocations come from btbeginscan() where we
>> do this:
>>
>> so = (BTScanOpaque) palloc(sizeof(BTScanOpaqueData));
>>
>> and BTScanOpaqueData is ~27kB, which means it's an oversized chunk and
>> thus always allocated using a separate malloc() call. Maybe we could
>> break it into smaller/cacheable parts, but I haven't tried, and I doubt
>>>>> it's the only such allocation.
>
> Did you try running an strace on the process ? That may give you some
> hindsights into what malloc is doing. A more sophisticated approach would be
> using stap and plugging it into the malloc probes, for example
> memory_sbrk_more and memory_sbrk_less.
>

No, I haven't tried that. In my experience strace is pretty expensive,
and if the issue is in glibc itself (before it does the syscalls),
strace won't really tell us much. Not sure, ofc.

> An important part of glibc's malloc behaviour in that regard comes from the
> adjustment of the mmap and free threshold. By default, mmap adjusts them
> dynamically and you can poke into that using the
> memory_mallopt_free_dyn_thresholds probe.
>

Thanks, I'll take a look at that.

>>
>> FWIW I was wondering if this is a glibc-specific malloc bottleneck, so I
>> tried running the benchmarks with LD_PRELOAD=jemalloc, and that improves
>> the behavior a lot - it gets us maybe ~80% of the mempool benefits.
>> Which is nice, it confirms it's glibc-specific (I wonder if there's a
>> way to tweak glibc to address this), and it also means systems using
>> jemalloc (e.g. FreeBSD, right?) don't have this problem. But it also
>> says the mempool has ~20% benefit on top of jemalloc.
>
> GLIBC's malloc offers some tuning for this. In particular, setting either
> M_MMAP_THRESHOLD or M_TRIM_THRESHOLD will disable the unpredictable "auto
> adjustment" beheviour and allow you to control what it's doing.
>
> By setting a bigger M_TRIM_THRESHOLD, one can make sure memory allocated using
> sbrk isn't freed as easily, and you don't run into a pattern of moving the
> sbrk pointer up and down repeatedly. The automatic trade off between the mmap
> and trim thresholds is supposed to prevent that, but the way it is incremented
> means you can end in a bad place depending on your particular allocation
> patttern.
>

So, what values would you recommend for these parameters?

My concern is increasing those value would lead to (much) higher memory
usage, with little control over it. With the mempool we keep more
blocks, ofc, but we have control over freeing the memory.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jelte Fennema-Nio 2024-01-29 12:28:22 Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Previous Message Heikki Linnakangas 2024-01-29 12:06:01 DSA_ALLOC_NO_OOM doesn't work