Re: Default setting for enable_hashagg_disk

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Default setting for enable_hashagg_disk
Date: 2020-07-24 02:33:45
Message-ID: CAH2-Wz=QFBj=RhsqqC0-1a3f8BxLXP-x39vZ0dLjMyFmDXpDtw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs pgsql-hackers

On Thu, Jul 23, 2020 at 6:22 PM Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> So let me share some fresh I/O statistics collected on the current code
> using iosnoop. I've done the tests on two different machines using the
> "aggregate part" of TPC-H Q17, i.e. essentially this:
>
> SELECT * FROM (
> SELECT
> l_partkey AS agg_partkey,
> 0.2 * avg(l_quantity) AS avg_quantity
> FROM lineitem GROUP BY l_partkey OFFSET 1000000000
> ) part_agg;
>
> The OFFSET is there just to ensure we don't need to send anything to
> the client, etc.

Thanks for testing this.

> So sort writes ~3.4GB of data, give or take. But hashagg/master writes
> almost 6-7GB of data, i.e. almost twice as much. Meanwhile, with the
> original CP_SMALL_TLIST we'd write "only" ~5GB of data. That's still
> much more than the 3.4GB of data written by sort (which has to spill
> everything, while hashagg only spills rows not covered by the groups
> that fit into work_mem).

What I find when I run your query (with my own TPC-H DB that is
smaller than what you used here -- 59,986,052 lineitem tuples) is that
the sort required about 7x more memory than the hash agg to do
everything in memory: 4,384,711KB for the quicksort vs 630,801KB peak
hash agg memory usage. I'd be surprised if the ratio was very
different for you -- but can you check?

I think that there is something pathological about this spill
behavior, because it sounds like the precise opposite of what you
might expect when you make a rough extrapolation of what disk I/O will
be based on the memory used in no-spill cases (as reported by EXPLAIN
ANALYZE).

> What I find really surprising is the costing - despite writing about
> twice as much data, the hashagg cost is estimated to be much lower than
> the sort. For example on the i5 machine, the hashagg cost is ~10M, while
> sort cost is almost 42M. Despite using almost twice as much disk. And
> the costing is exactly the same for master and the CP_SMALL_TLIST.

That does make it sound like the costs of the hash agg aren't being
represented. I suppose it isn't clear if this is a costing issue
because it isn't clear if the execution time performance itself is
pathological or is instead something that must be accepted as the cost
of spilling the hash agg in a general kind of way.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message PG Doc comments form 2020-07-24 08:07:32 typo in sentence
Previous Message Tomas Vondra 2020-07-24 01:22:48 Re: Default setting for enable_hashagg_disk

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-07-24 02:34:53 Re: Making CASE error handling less surprising
Previous Message Andres Freund 2020-07-24 02:04:02 Re: [Patch] ALTER SYSTEM READ ONLY