Re: Use generation context to speed up tuplesorts

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tv(at)fuzzy(dot)cz>, David Rowley <dgrowleyml(at)gmail(dot)com>
Subject: Re: Use generation context to speed up tuplesorts
Date: 2022-01-07 12:03:28
Message-ID: 94e34870-9e3b-8c49-6617-016253129f06@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/7/22 12:03, Ronan Dunklau wrote:
> Le vendredi 31 décembre 2021, 22:26:37 CET David Rowley a écrit :
>> I've attached some benchmark results that I took recently. The
>> spreadsheet contains results from 3 versions. master, master + 0001 -
>> 0002, then master + 0001 - 0003. The 0003 patch makes the code a bit
>> more conservative about the chunk sizes it allocates and also tries to
>> allocate the tuple array according to the number of tuples we expect
>> to be able to sort in a single batch for when the sort is not
>> estimated to fit inside work_mem.
>
> (Sorry for trying to merge back the discussion on the two sides of the thread)
>
> In https://www.postgresql.org/message-id/4776839.iZASKD2KPV%40aivenronan, I
> expressed the idea of being able to tune glibc's malloc behaviour.
>
> I implemented that (patch 0001) to provide a new hook which is called on
> backend startup, and anytime we set work_mem. This hook is # defined depending
> on the malloc implementation: currently a default, no-op implementation is
> provided as well as a glibc's malloc implementation.
>

Not sure I'd call this a hook - that usually means a way to plug-in
custom code through a callback, and this is simply ifdefing a block of
code to pick the right implementation. Which may be a good way to do
that, just let's not call that a hook.

There's a commented-out MallocTuneHook() call, probably not needed.

I wonder if #ifdefing is sufficient solution, because it happens at
compile time, so what if someone overrides the allocator in LD_PRELOAD?
That was a fairly common way to use a custom allocator in an existing
application. But I don't know how many people do that with Postgres (I'm
not aware of anyone doing that) or if we support that (it'd probably
apply to other stuff too, not just malloc). So maybe it's OK, and I
can't think of a better way anyway.

> The glibc's malloc implementation relies on a new GUC,
> glibc_malloc_max_trim_threshold. When set to it's default value of -1, we
> don't tune malloc at all, exactly as in HEAD. If a different value is provided,
> we set M_MMAP_THRESHOLD to half this value, and M_TRIM_TRESHOLD to this value,
> capped by work_mem / 2 and work_mem respectively.
>
> The net result is that we can then allow to keep more unused memory at the top
> of the heap, and to use mmap less frequently, if the DBA chooses too. A
> possible other use case would be to on the contrary, limit the allocated
> memory in idle backends to a minimum.
>
> The reasoning behind this is that glibc's malloc default way of handling those
> two thresholds is to adapt to the size of the last freed mmaped block.
>
> I've run the same "up to 32 columns" benchmark as you did, with this new patch
> applied on top of both HEAD and your v2 patchset incorporating planner
> estimates for the block sizez. Those are called "aset" and "generation" in the
> attached spreadsheet. For each, I've run it with
> glibc_malloc_max_trim_threshold set to -1, 1MB, 4MB and 64MB. In each case
> I've measured two things:
> - query latency, as reported by pgbench
> - total memory allocated by malloc at backend ext after running each query
> three times. This represents the "idle" memory consumption, and thus what we
> waste in malloc inside of releasing back to the system. This measurement has
> been performed using the very small module presented in patch 0002. Please
> note that I in no way propose that we include this module, it was just a
> convenient way for me to measure memory footprint.
>
> My conclusion is that the impressive gains you see from using the generation
> context with bigger blocks mostly comes from the fact that we allocate bigger
> blocks, and that this moves the mmap thresholds accordingly. I wonder how much
> of a difference it would make on other malloc implementation: I'm afraid the
> optimisation presented here would in fact be specific to glibc's malloc, since
> we have almost the same gains with both allocators when tuning malloc to keep
> more memory. I still think both approaches are useful, and would be necessary.
>

Interesting measurements. It's intriguing that for generation contexts,
the default "-1" often outperforms "1MB" (but not the other options),
while for aset it's pretty much "the higher value the better".

> Since this affects all memory allocations, I need to come up with other
> meaningful scenarios to benchmarks.
>

OK. Are you thinking about a different microbenchmark, or something
closer to real workload?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Juan José Santamaría Flecha 2022-01-07 12:20:52 Fix vcregress plpython3 warning
Previous Message torikoshia 2022-01-07 11:58:19 Re: RFC: Logging plan of the running query