Re: Memory-Bounded Hash Aggregation

From: Adam Lee <ali(at)pivotal(dot)io>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Taylor Vesely <tvesely(at)pivotal(dot)io>, Melanie Plageman <mplageman(at)pivotal(dot)io>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Memory-Bounded Hash Aggregation
Date: 2019-12-10 21:34:22
Message-ID: 20191210213422.GD1345@mars.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 04, 2019 at 10:57:51PM -0800, Jeff Davis wrote:
> > About the `TODO: project needed attributes only` in your patch, when
> > would the input tuple contain columns not needed? It seems like
> > anything
> > you can project has to be in the group or aggregates.
>
> If you have a table like:
>
> CREATE TABLE foo(i int, j int, x int, y int, z int);
>
> And do:
>
> SELECT i, SUM(j) FROM foo GROUP BY i;
>
> At least from a logical standpoint, you might expect that we project
> only the attributes we need from foo before feeding them into the
> HashAgg. But that's not quite how postgres works. Instead, it leaves
> the tuples intact (which, in this case, means they have 5 attributes)
> until after aggregation and lazily fetches whatever attributes are
> referenced. Tuples are spilled from the input, at which time they still
> have 5 attributes; so naively copying them is wasteful.
>
> I'm not sure how often this laziness is really a win in practice,
> especially after the expression evaluation has changed so much in
> recent releases. So it might be better to just project all the
> attributes eagerly, and then none of this would be a problem. If we
> still wanted to be lazy about attribute fetching, that should still be
> possible even if we did a kind of "logical" projection of the tuple so
> that the useless attributes would not be relevant. Regardless, that's
> outside the scope of the patch I'm currently working on.
>
> What I'd like to do is copy just the attributes needed into a new
> virtual slot, leave the unneeded ones NULL, and then write it out to
> the tuplestore as a MinimalTuple. I just need to be sure to get the
> right attributes.
>
> Regards,
> Jeff Davis

Melanie and I tried this, had a installcheck passed patch. The way how
we verify it is composing a wide table with long unnecessary text
columns, then check the size it writes on every iteration.

Please check out the attachment, it's based on your 1204 version.

--
Adam Lee

Attachment Content-Type Size
spill_fewer_cols.patch text/plain 5.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jens-Wolfhard Schicke-Uffmann 2019-12-10 21:44:17 Re: Contention on LWLock buffer_content, due to SHARED lock(?)
Previous Message Alvaro Herrera 2019-12-10 21:12:47 Re: log bind parameter values on error