Re: Memory-Bounded Hash Aggregation

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Adam Lee <ali(at)pivotal(dot)io>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Taylor Vesely <tvesely(at)pivotal(dot)io>, Melanie Plageman <mplageman(at)pivotal(dot)io>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Memory-Bounded Hash Aggregation
Date: 2019-12-05 06:57:51
Message-ID: 1b5c79a14f85e20899b100381c1e8d41e3148c5b.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 2019-12-04 at 19:50 -0800, Adam Lee wrote:
> On Wed, Dec 04, 2019 at 06:55:43PM -0800, Jeff Davis wrote:
> >
> > Thanks very much for a great review! I've attached a new patch.
>
> Hi,
>
> About the `TODO: project needed attributes only` in your patch, when
> would the input tuple contain columns not needed? It seems like
> anything
> you can project has to be in the group or aggregates.

If you have a table like:

CREATE TABLE foo(i int, j int, x int, y int, z int);

And do:

SELECT i, SUM(j) FROM foo GROUP BY i;

At least from a logical standpoint, you might expect that we project
only the attributes we need from foo before feeding them into the
HashAgg. But that's not quite how postgres works. Instead, it leaves
the tuples intact (which, in this case, means they have 5 attributes)
until after aggregation and lazily fetches whatever attributes are
referenced. Tuples are spilled from the input, at which time they still
have 5 attributes; so naively copying them is wasteful.

I'm not sure how often this laziness is really a win in practice,
especially after the expression evaluation has changed so much in
recent releases. So it might be better to just project all the
attributes eagerly, and then none of this would be a problem. If we
still wanted to be lazy about attribute fetching, that should still be
possible even if we did a kind of "logical" projection of the tuple so
that the useless attributes would not be relevant. Regardless, that's
outside the scope of the patch I'm currently working on.

What I'd like to do is copy just the attributes needed into a new
virtual slot, leave the unneeded ones NULL, and then write it out to
the tuplestore as a MinimalTuple. I just need to be sure to get the
right attributes.

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2019-12-05 07:28:04 Re: Memory-Bounded Hash Aggregation
Previous Message Amit Kapila 2019-12-05 06:25:55 Re: [HACKERS] Block level parallel vacuum