Time to increase hash_mem_multiplier default?

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Time to increase hash_mem_multiplier default?
Date: 2022-01-17 00:28:03
Message-ID: CAH2-Wzndc_ROk6CY-bC6p9O53q974Y0Ey4WX8jcPbuTZYM4Q3A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The current hash_mem_multiplier default is 1.0, which is a fairly
conservative default: it preserves the historic behavior, which is
that hash-based executor nodes receive the same work_mem budget as
sort-based nodes. I propose that the default be increased to 2.0 for
Postgres 15.

Arguments in favor of artificially favoring hash-based nodes like this
were made when hash_mem_mutiplier went in. The short version goes like
this:

The relationship between memory availability and overall
performance/throughput has very significant differences when we
compare sort-based nodes with hash-based nodes. It's hard to make
reliable generalizations about how the performance/throughput of
hash-based nodes will be affected as memory is subtracted, even if we
optimistically assume that requirements are fairly fixed. Data
cardinality tends to make the picture complicated, just for starters.
But overall, as a general rule, more memory tends to make everything
go faster.

On the other hand, sort-based nodes (e.g., GroupAggregate) have very
predictable performance characteristics, and the possible upside of
allowing a sort node to use more memory is quite bounded. There is a
relatively large drop-off when we go from not being able to fit
everything in memory to needing to do an external sort. But even that
drop-off isn't very big -- not in absolute terms. More importantly,
there is hardly any impact as we continue to subtract memory (or add
more data). We'll still be able to do a single pass external sort with
only a small fraction of the memory needed to sort everything in
memory, which (perhaps surprisingly) is mostly all that matters.

The choice of 2.0 is still pretty conservative. I'm not concerned
about making hash nodes go faster (or used more frequently) -- at
least not primarily. I'm more worried about avoiding occasional OOMs
from sort nodes that use much more memory than could ever really make
sense. It's easy to demonstrate that making more memory available to
an external sort makes just about no difference, until you give it all
the memory it can make use of. This effect is reliable (data
cardinality won't matter, for example). And so the improvement that is
possible from giving a sort more memory is far smaller than (say) the
improvement in performance we typically see when the optimizer
switches from a hash aggregate to a group aggregate.

--
Peter Geoghegan

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2022-01-17 00:29:36 Re: [PATCH] rename column if exists
Previous Message Julien Rouhaud 2022-01-17 00:10:43 Re: default to to ON_ERROR_STOP=on (Re: psql: exit status with multiple -c and -f)