Re: [PATCH] Incremental sort (was: PoC: Partial sort)

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: James Coleman <jtc331(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Shaun Thomas <shaun(dot)thomas(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date: 2020-03-14 00:12:54
Message-ID: 20200314001254.gdusqeanl7wrxmr7@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 13, 2020 at 04:31:16PM -0400, James Coleman wrote:
>On Fri, Mar 13, 2020 at 2:23 PM James Coleman <jtc331(at)gmail(dot)com> wrote:
>>
>> On Tue, Mar 10, 2020 at 10:44 PM Tomas Vondra
>> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> > 3) Most of the execution plans look reasonable, except that some of the
>> > plans look like this:
>> >
>> >
>> > QUERY PLAN
>> > ---------------------------------------------------------
>> > Limit
>> > -> GroupAggregate
>> > Group Key: t.a, t.b, t.c, t.d
>> > -> Incremental Sort
>> > Sort Key: t.a, t.b, t.c, t.d
>> > Presorted Key: t.a, t.b, t.c
>> > -> Incremental Sort
>> > Sort Key: t.a, t.b, t.c
>> > Presorted Key: t.a, t.b
>> > -> Index Scan using t_a_b_idx on t
>> > (10 rows)
>> >
>> > i.e. there are two incremental sorts on top of each other, with
>> > different prefixes. But this this is not a new issue - it happens with
>> > queries like this:
>> >
>> > SELECT a, b, c, d, count(*) FROM (
>> > SELECT * FROM t ORDER BY a, b, c
>> > ) foo GROUP BY a, b, c, d limit 1000;
>> >
>> > i.e. there's a subquery with a subset of pathkeys. Without incremental
>> > sort the plan looks like this:
>> >
>> > QUERY PLAN
>> > ---------------------------------------------
>> > Limit
>> > -> GroupAggregate
>> > Group Key: t.a, t.b, t.c, t.d
>> > -> Sort
>> > Sort Key: t.a, t.b, t.c, t.d
>> > -> Sort
>> > Sort Key: t.a, t.b, t.c
>> > -> Seq Scan on t
>> > (8 rows)
>> >
>> > so essentially the same plan shape. What bugs me though is that there
>> > seems to be some sort of memory leak, so that this query consumes
>> > gigabytes os RAM before it gets killed by OOM. But the memory seems not
>> > to be allocated in any memory context (at least MemoryContextStats don't
>> > show anything like that), so I'm not sure what's going on.
>> >
>> > Reproducing it is fairly simple:
>> >
>> > CREATE TABLE t (a bigint, b bigint, c bigint, d bigint);
>> > INSERT INTO t SELECT
>> > 1000*random(), 1000*random(), 1000*random(), 1000*random()
>> > FROM generate_series(1,10000000) s(i);
>> > CREATE INDEX idx ON t(a,b);
>> > ANALYZE t;
>> >
>> > EXPLAIN ANALYZE SELECT a, b, c, d, count(*)
>> > FROM (SELECT * FROM t ORDER BY a, b, c) foo GROUP BY a, b, c, d
>> > LIMIT 100;
>>
>> While trying to reproduce this, instead of lots of memory usage, I got
>> the attached assertion failure instead.
>
>And, without the EXPLAIN ANALYZE was able to get this one, which will
>probably be a lot more helpful.
>

Hmmm, I'll try reproducing it, but can you investigate the values in the
Assert? I mean, it fails on this:

Assert(total_allocated == context->mem_allocated);

so can you get a core or attach to the process using gdb, and see what's
the expected / total value?

BTW, I might have copied the wrong query - can you try with a higher
value in the LIMIT clause? For example:

EXPLAIN ANALYZE SELECT a, b, c, d, count(*)
FROM (SELECT * FROM t ORDER BY a, b, c) foo GROUP BY a, b, c, d
LIMIT 1000000;

I think this might be the differenc ewhy you don't see the memory leak.
Or maybe it was because of asserts? I'm not sure if I had enabled them
in the build ...

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message James Coleman 2020-03-14 00:23:07 Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Previous Message Justin Pryzby 2020-03-14 00:10:00 Re: Berserk Autovacuum (let's save next Mandrill)