Re: Default setting for enable_hashagg_disk

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, Bruce Momjian <bruce(at)momjian(dot)us>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Default setting for enable_hashagg_disk
Date: 2020-07-24 16:01:47
Message-ID: 20200724160147.l673k4yflz7flkh5@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs pgsql-hackers

On Fri, Jul 24, 2020 at 11:18:48AM -0400, Robert Haas wrote:
>On Thu, Jul 23, 2020 at 9:22 PM Tomas Vondra
><tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> 2MB 4MB 8MB 64MB 256MB
>> -----------------------------------------------------------
>> hash 6.71 6.70 6.73 6.44 5.81
>> hash CP_SMALL_TLIST 5.28 5.26 5.24 5.04 4.54
>> sort 3.41 3.41 3.41 3.57 3.45
>>
>> So sort writes ~3.4GB of data, give or take. But hashagg/master writes
>> almost 6-7GB of data, i.e. almost twice as much. Meanwhile, with the
>> original CP_SMALL_TLIST we'd write "only" ~5GB of data. That's still
>> much more than the 3.4GB of data written by sort (which has to spill
>> everything, while hashagg only spills rows not covered by the groups
>> that fit into work_mem).
>>
>> I initially assumed this is due to writing the hash value to the tapes,
>> and the rows are fairly narrow (only about 40B per row), so a 4B hash
>> could make a difference - but certainly not this much. Moreover, that
>> does not explain the difference between master and the now-reverted
>> CP_SMALL_TLIST, I think.
>
>This is all really good analysis, I think, but this seems like the key
>finding. It seems like we don't really understand what's actually
>getting written. Whether we use hash or sort doesn't seem like it
>should have this kind of impact on how much data gets written, and
>whether we use CP_SMALL_TLIST or project when needed doesn't seem like
>it should matter like this either.
>

I think for CP_SMALL_TLIST at least some of the extra data can be
attributed to writing the hash value along with the tuple, which sort
obviously does not do. With the 32GB data set (the i5 machine), there
are ~20M rows in the lineitem table, and with 4B hash values that's
about 732MB of extra data. That's about the 50% of the difference
between sort and CP_SMALL_TLIST, and I'd dare to speculate the other 50%
is due to LogicalTape internals (pointers to the next block, etc.)

The question is why master has 2x the overhead of CP_SMALL_TLIST, if
it's meant to write the same set of columns etc.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-docs by date

  From Date Subject
Next Message PG Doc comments form 2020-07-24 17:19:04 Client parameter list omits timezone
Previous Message Robert Haas 2020-07-24 15:18:48 Re: Default setting for enable_hashagg_disk

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2020-07-24 16:05:35 Missing CFI in hlCover()?
Previous Message Robert Haas 2020-07-24 15:39:46 Re: [HACKERS] Custom compression methods