Re: Default setting for enable_hashagg_disk

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, Bruce Momjian <bruce(at)momjian(dot)us>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Default setting for enable_hashagg_disk
Date: 2020-07-24 19:55:01
Message-ID: CAH2-Wzkv7bnoRTkniW4h4RVWbn6mGVySjik7opUiYsc5RGrfBA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs pgsql-hackers

On Fri, Jul 24, 2020 at 12:16 PM Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> Maybe, but we're nowhere close to these limits. See this table which I
> posted earlier:
>
> 2MB Planned Partitions: 64 HashAgg Batches: 4160
> 4MB Planned Partitions: 128 HashAgg Batches: 16512
> 8MB Planned Partitions: 256 HashAgg Batches: 21488
> 64MB Planned Partitions: 32 HashAgg Batches: 2720
> 256MB Planned Partitions: 8 HashAgg Batches: 8
>
> This is from the non-parallel runs on the i5 machine with 32GB data set,
> the first column is work_mem. We're nowhere near the 1024 limit, and the
> cardinality estimates are pretty good.
>
> OTOH the number o batches is much higher, so clearly there was some
> recursive spilling happening. What I find strange is that this grows
> with work_mem and only starts dropping after 64MB.

Could that be caused by clustering in the data?

If the input data is in totally random order then we have a good
chance of never having to spill skewed "common" values. That is, we're
bound to encounter common values before entering spill mode, and so
those common values will continue to be usefully aggregated until
we're done with the initial groups (i.e. until the in-memory hash
table is cleared in order to process spilled input tuples). This is
great because the common values get aggregated without ever spilling,
and most of the work is done before we even begin with spilled tuples.

If, on the other hand, the common values are concentrated together in
the input...

Assuming that I have this right, then I would also expect simply
having more memory to ameliorate the problem. If you only have/need 4
or 8 partitions then you can fit a higher proportion of the total
number of groups for the whole dataset in the hash table (at the point
when you first enter spill mode). I think it follows that the "nailed"
hash table entries/groupings will "better characterize" the dataset as
a whole.

> Also, how could the amount of I/O be almost constant in all these cases?
> Surely more recursive spilling should do more I/O, but the Disk Usage
> reported by explain analyze does not show anything like ...

Not sure, but might that just be because of the fact that logtape.c
can recycle disk space?

As I said in my last e-mail, it's pretty reasonable to assume that the
vast majority of external sorts are one-pass. It follows that disk
usage can be thought of as almost the same thing as total I/O for
tuplesort. But the same heuristic isn't reasonable when thinking about
hash agg. Hash agg might write out much less data than the total
memory used for the equivalent "peak optimal nospill" hash agg case --
or much more. (Again, reiterating what I said in my last e-mail.)

--
Peter Geoghegan

In response to

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message PG Doc comments form 2020-07-25 09:41:59 UTF8 conversion
Previous Message Tomas Vondra 2020-07-24 19:16:48 Re: Default setting for enable_hashagg_disk

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-07-24 21:09:33 Re: Row estimates for empty tables
Previous Message Tom Lane 2020-07-24 19:49:14 Re: [PATCH] fix GIN index search sometimes losing results