Re: 9.5: Better memory accounting, towards memory-bounded HashAgg

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: 9.5: Better memory accounting, towards memory-bounded HashAgg
Date: 2014-12-21 19:19:46
Message-ID: 54971D52.7070903@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2.12.2014 06:14, Jeff Davis wrote:
> On Sun, 2014-11-30 at 17:49 -0800, Peter Geoghegan wrote:
>> On Mon, Nov 17, 2014 at 11:39 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>>> I can also just move isReset there, and keep mem_allocated as a uint64.
>>> That way, if I find later that I want to track the aggregated value for
>>> the child contexts as well, I can split it into two uint32s. I'll hold
>>> off any any such optimizations until I see some numbers from HashAgg
>>> though.
>>
>> I took a quick look at memory-accounting-v8.patch.
>>
>> Is there some reason why mem_allocated is a uint64? All other things
>> being equal, I'd follow the example of tuplesort.c's
>> MemoryContextAllocHuge() API, which (following bugfix commit
>> 79e0f87a1) uses int64 variables to track available memory and so on.
>
> No reason. New version attached; that's the only change.

I've started reviewing this today. It does not apply cleanly on current
head, because of 4a14f13a committed a few days ago. That commit changed
the constants in src/include/utils/hsearch.h, so the patch needs to use
this:

#define HASH_NOCHILDCXT 0x4000 /* ... */

That's the only conflict, and after fixing it it compiles OK. However, I
got a segfault on the very first query I tried :-(

create table test_hash_agg as select i AS a, i AS b, i AS c, i AS d
from generate_series(1,10000000) s(i);

analyze test_hash_agg;

select a, count(*) from test_hash_agg where a < 100000 group by a;

With a fresh cluster (default config), I get this:

server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

The backtrace looks like this (full attached):

Program received signal SIGSEGV, Segmentation fault.
advance_transition_function (aggstate=aggstate(at)entry=0x2b5c5f0,
peraggstate=peraggstate(at)entry=0x2b5efd0,
pergroupstate=pergroupstate(at)entry=0x8) at nodeAgg.c:468
468 if (pergroupstate->noTransValue)
(gdb) bt
#0 advance_transition_function at nodeAgg.c:468
#1 0x00000000005c3494 in advance_aggregates at nodeAgg.c:624
#2 0x00000000005c3dc2 in agg_fill_hash_table at nodeAgg.c:1640
#3 ExecAgg (node=node(at)entry=0x2b5c5f0) at nodeAgg.c:1338

(gdb) print pergroupstate->noTransValue
Cannot access memory at address 0x11
(gdb) print pergroupstate
$1 = (AggStatePerGroup) 0x8

My guess is something is scribbling over the pergroup memory, or maybe
the memory context gets released, or something. In any case, it's easily
reproducible, and apparently deterministic (always exactly the same
values, no randomness).

regards
Tomas

Attachment Content-Type Size
segfault.txt text/plain 3.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2014-12-21 19:22:10 Re: controlling psql's use of the pager a bit more
Previous Message Tom Lane 2014-12-21 19:18:33 Re: Proposal "VACUUM SCHEMA"