Re: Memory Accounting v11

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Memory Accounting v11
Date: 2015-06-14 22:51:12
Message-ID: 557E0560.7020306@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 06/14/15 22:21, Tom Lane wrote:
> Jeff Davis <pgsql(at)j-davis(dot)com> writes:
>> This patch tracks memory usage (at the block level) for all memory
>> contexts. Individual palloc()s aren't tracked; only the new blocks
>> allocated to the memory context with malloc().
>> ...
>> My general answer to the performance concerns is that they aren't a
>> reason to block this patch, unless someone has a suggestion about how to
>> fix them. Adding one field to a struct and a few arithmetic operations
>> for each malloc() or realloc() seems reasonable to me.
>
> Maybe, but this here is a pretty weak argument:
>
>> The current state, where HashAgg just blows up the memory, is just not
>> reasonable, and we need to track the memory to fix that problem.
>
> Meh. HashAgg could track its memory usage without loading the entire
> system with a penalty.

+1 to a solution like that, although I don't think that's doable without
digging the info from memory contexts somehow.

> Moreover, this is about fourth or fifth down the
> list of the implementation problems with spilling hash aggregation to
> disk. It would be good to see credible solutions for the bigger issues
> before we buy into adding overhead for a mechanism with no uses in core.

I don't think so. If we don't have an acceptable solution for tracking
memory in hashagg, there's not really much point in doing any of the
following steps. That's why Jeff is tackling this problem first.

Also, Jeff already posted a PoC of the memory-bounded hashagg, although
it only worked for aggregates with fixed-size state, and not that great
for cases like array_agg where the state grows. Maybe the improvements
to aggregate functions proposed by David Rowley last week might help in
those cases, as that'd allow spilling those states to disk.

So while the approach proposed by Jeff may turn out not to be the right
one, I think memory accounting needs to be solved first (even if it's
not committed until the whole feature is ready).

>> Others have also mentioned that we might want to use this mechanism
>> to track memory for other operators, like Sort or HashJoin, which
>> might be simpler and more accurate.
>
> That would imply redefining the meaning of work_mem for those
> operations; it would suddenly include a lot more overhead space than
> it used to, causing them to spill to disk much more quickly than
> before, unless one changes the work_mem setting to compensate. Not
> sure that people would like such a change.

Don't we tweak the work_mem meaning over time anyway? Maybe not to such
extent, but for example the hashjoin 9.5 improvements certainly change
this by packing the tuples more densely, sizing the buckets differently
etc. It's true the changes are in the other direction (i.e. batching
less frequently) though.

OTOH this would make the accounting more accurate (e.g. eliminating
differences between cases with different amounts of overhead because of
different tuple width, for example). Wouldn't that be a good thing, even
if people need to tweak the work_mem? Isn't that kinda acceptable when
upgrading to the next major version?

> An even bigger problem is that it would pretty much break the logic
> around LACKMEM() in tuplesort.c, which supposes that spilling
> individual tuples to disk is a reliable and linear way to decrease
> the accounted-for memory. Individual pfree's would typically not
> decrease the accounted total in this implementation. Depending on
> what you have in mind for the spill-to-disk behavior, this issue
> could be a fail for HashAgg as well, which is another reason not to
> commit this patch in advance of seeing the use-case.

That's true, but I think the plan was always to wait for the whole
feature, and only then commit all the pieces.

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2015-06-14 23:26:26 Re: The real reason why TAP testing isn't ready for prime time
Previous Message Petr Jelinek 2015-06-14 22:50:08 creating extension including dependencies