Re: Memory-Bounded Hash Aggregation

From: Pengzhou Tang <ptang(at)pivotal(dot)io>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Taylor Vesely <tvesely(at)pivotal(dot)io>, Adam Lee <ali(at)pivotal(dot)io>, Melanie Plageman <mplageman(at)pivotal(dot)io>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Memory-Bounded Hash Aggregation
Date: 2020-03-20 05:20:33
Message-ID: CAG4reARM9L-cMPKfhcK46K=OmdAYH8h3YDA_uhpXBEoiXTRLUQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I happen to notice that "set enable_sort to false" cannot guarantee the
planner to use hashagg in test groupingsets.sql,
the following comparing results of sortagg and hashagg seems to have no
meaning.

Thanks,
Pengzhou

On Thu, Mar 19, 2020 at 7:36 AM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:

>
> Committed.
>
> There's some future work that would be nice (some of these are just
> ideas and may not be worth it):
>
> * Refactor MemoryContextMemAllocated() to be a part of
> MemoryContextStats(), but allow it to avoid walking through the blocks
> and freelists.
>
> * Improve the choice of the initial number of buckets in the hash
> table. For this patch, I tried to preserve the existing behavior of
> estimating the number of groups and trying to initialize with that many
> buckets. But my performance tests seem to indicate this is not the best
> approach. More work is needed to find what we should really do here.
>
> * For workloads that are not in work_mem *or* system memory, and need
> to actually go to storage, I see poor CPU utilization because it's not
> effectively overlapping CPU and IO work. Perhaps buffering or readahead
> changes can improve this, or async IO (even better).
>
> * Project unnecessary attributes away before spilling tuples to disk.
>
> * Improve logtape.c API so that the caller doesn't need to manage a
> bunch of tape numbers.
>
> * Improve estimate of the hash entry size. This patch doesn't change
> the way the planner estimates it, but I observe that actual size as
> seen at runtime is significantly different. This is connected to the
> initial number of buckets for the hash table.
>
> * In recursive steps, I don't have a good estimate for the number of
> groups, so I just estimate it as the number of tuples in that spill
> tape (which is pessimistic). That could be improved by doing a real
> cardinality estimate as the tuples are spilling (perhaps with HLL?).
>
> * Many aggregates with pass-by-ref transition states don't provide a
> great aggtransspace. We should consider doing something smarter, like
> having negative numbers represent a number that should be multiplied by
> the size of the group (e.g. ARRAY_AGG would have a size dependent on
> the group size, not a constant).
>
> * If we want to handle ARRAY_AGG (and the like) well, we can consider
> spilling the partial states in the hash table whem the memory is full.
> That would add a fair amount of complexity because there would be two
> types of spilled data (tuples and partial states), but it could be
> useful in some cases.
>
> Regards,
> Jeff Davis
>
>
>
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2020-03-20 05:23:03 Re: Improving connection scalability: GetSnapshotData()
Previous Message Andres Freund 2020-03-20 04:56:37 Re: Missing errcode() in ereport