pgsql: Disk-based Hash Aggregation.

From: Jeff Davis <jdavis(at)postgresql(dot)org>
To: pgsql-committers(at)lists(dot)postgresql(dot)org
Subject: pgsql: Disk-based Hash Aggregation.
Date: 2020-03-18 22:54:49
Message-ID: E1jEhaL-0002Ur-2L@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

Disk-based Hash Aggregation.

While performing hash aggregation, track memory usage when adding new
groups to a hash table. If the memory usage exceeds work_mem, enter
"spill mode".

In spill mode, new groups are not created in the hash table(s), but
existing groups continue to be advanced if input tuples match. Tuples
that would cause a new group to be created are instead spilled to a
logical tape to be processed later.

The tuples are spilled in a partitioned fashion. When all tuples from
the outer plan are processed (either by advancing the group or
spilling the tuple), finalize and emit the groups from the hash
table. Then, create new batches of work from the spilled partitions,
and select one of the saved batches and process it (possibly spilling
recursively).

Author: Jeff Davis
Reviewed-by: Tomas Vondra, Adam Lee, Justin Pryzby, Taylor Vesely, Melanie Plageman
Discussion: https://postgr.es/m/507ac540ec7c20136364b5272acbcd4574aa76ef.camel@j-davis.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/1f39bce021540fde00990af55b4432c55ef4b3c7

Modified Files
--------------
doc/src/sgml/config.sgml | 32 +
src/backend/commands/explain.c | 37 +
src/backend/executor/nodeAgg.c | 1092 ++++++++++++++++++++++++-
src/backend/optimizer/path/costsize.c | 70 +-
src/backend/optimizer/plan/planner.c | 19 +-
src/backend/optimizer/prep/prepunion.c | 2 +-
src/backend/optimizer/util/pathnode.c | 14 +-
src/backend/utils/misc/guc.c | 20 +
src/include/executor/nodeAgg.h | 8 +
src/include/nodes/execnodes.h | 22 +-
src/include/optimizer/cost.h | 4 +-
src/test/regress/expected/aggregates.out | 184 +++++
src/test/regress/expected/groupingsets.out | 122 +++
src/test/regress/expected/select_distinct.out | 62 ++
src/test/regress/expected/sysviews.out | 4 +-
src/test/regress/sql/aggregates.sql | 131 +++
src/test/regress/sql/groupingsets.sql | 103 +++
src/test/regress/sql/select_distinct.sql | 62 ++
18 files changed, 1950 insertions(+), 38 deletions(-)

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Alvaro Herrera 2020-03-18 23:24:04 Re: pgsql: Disk-based Hash Aggregation.
Previous Message Jeff Davis 2020-03-18 22:39:28 pgsql: Specialize MemoryContextMemAllocated().

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2020-03-18 23:08:47 Re: Define variables in the approprieate scope
Previous Message Jeff Davis 2020-03-18 22:41:42 Re: Make MemoryContextMemAllocated() more precise