Re: AllocSetEstimateChunkSpace()

From: Andres Freund <andres(at)anarazel(dot)de>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: AllocSetEstimateChunkSpace()
Date: 2020-03-25 19:42:35
Message-ID: 20200325194235.3fciqzxvbyg7q2c3@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2020-03-24 18:12:03 -0700, Jeff Davis wrote:
> Attached is a small patch that introduces a simple function,
> AllocSetEstimateChunkSpace(), and uses it to improve the estimate for
> the size of a hash entry for hash aggregation.
>
> Getting reasonable estimates for the hash entry size (including the
> TupleHashEntryData, the group key tuple, the per-group state, and by-
> ref transition values) is important for multiple reasons:
>
> * It helps the planner know when hash aggregation is likely to spill,
> and how to cost it.
>
> * It helps hash aggregation regulate itself by setting a group limit
> (separate from the memory limit) to account for growing by-ref
> transition values.
>
> * It helps choose a good initial size for the hash table. Too large of
> a hash table will crowd out space that could be used for the group keys
> or the per-group state.
>
> Each group requires up to three palloc chunks: one for the group key
> tuple, one for the per-group states, and one for a by-ref transition
> value. Each chunk can have a lot of overhead: in addition to the chunk
> header (16 bytes overhead), it also rounds the value up to a power of
> two (~25% overhead). So, it's important to account for this chunk
> overhead.

As mention on im/call: I think this is mainly an argument for combining
the group tuple & per-group state allocations. I'm kind of fine with
afterwards taking the allocator overhead into account.

I still don't buy that its useful to estimate the by-ref transition
value overhead. We just don't have anything even have close enough to a
meaningful value to base this on. Even if we want to consider the
initial transition value or something, we'd be better off initially
over-estimating the size of the transition state by a lot more than 25%
(I am thinking more like 4x or so, with a minumum of 128 bytes or
so). Since this is about the initial size of the hashtable, we're better
off with a too small table, imo. A "too large" table is more likely to
end up needing to spill when filled to only a small degree.

I am kind of wondering if there's actually much point in trying to be
accurate here at all. Especially when doing this from the
planner. Since, for a large fraction of aggregates, we're going to very
roughly guess at transition space anyway, it seems like a bunch of
"overhead factors" could turn out to be better than trying to be
accurate on some parts, while still widely guessing at transition space.
But I'm not sure.

> I considered making it a method of a memory context, but the planner
> will call this function before the hash agg memory context is created.
> It seems to make more sense for this to just be an AllocSet-specific
> function for now.

-1 to this approach. I think it's architecturally the wrong direction to
add more direct calls to functions within specific contexts.

On 2020-03-25 11:46:31 -0700, Jeff Davis wrote:
> On Tue, 2020-03-24 at 18:12 -0700, Jeff Davis wrote:
> > I considered making it a method of a memory context, but the planner
> > will call this function before the hash agg memory context is
> > created.
>
> I implemented this approach also; attached.
>
> It works a little better by having access to the memory context, so it
> knows set->allocChunkLimit. It also allows it to work with the slab
> allocator, which needs the memory context to return a useful number.
> However, this introduces more code and awkwardly needs to use
> CurrentMemoryContext when called from the planner (because the actual
> memory context we're try to estimate for doesn't exist yet).

Yea, the "context needs to exist" part sucks. I really don't want to add
calls directly into AllocSet from more places though. And just ignoring
the parameters to the context seems wrong too.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-03-25 19:44:28 Re: plan cache overhead on plpgsql expression
Previous Message Justin Pryzby 2020-03-25 19:39:23 Re: Allow continuations in "pg_hba.conf" files