Re: Memory-Bounded Hash Aggregation

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Taylor Vesely <tvesely(at)pivotal(dot)io>, Adam Lee <ali(at)pivotal(dot)io>, Melanie Plageman <mplageman(at)pivotal(dot)io>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Memory-Bounded Hash Aggregation
Date: 2020-02-05 18:37:15
Message-ID: 66a1a669dfbb41259c60cbf70d2239fef8a2ce31.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2020-02-04 at 15:08 -0800, Peter Geoghegan wrote:
> Have you tested this against tuplesort.c, particularly parallel
> CREATE
> INDEX? It would be worth trying to measure any performance impact.
> Note that most parallel CREATE INDEX tuplesorts will do a merge
> within
> each worker, and one big merge in the leader. It's much more likely
> to
> have multiple passes than a regular serial external sort.

I did not observe any performance regression when creating an index in
parallel over 20M ints (random ints in random order). I tried 2
parallel workers with work_mem=4MB and also 4 parallel workers with
work_mem=256kB.

> Have you thought about integer overflow in your heap related
> routines?
> This isn't as unlikely as you might think. See commit 512f67c8, for
> example.

It's dealing with blocks rather than tuples, so it's a bit less likely.
But changed it to use "unsigned long" instead.

> Have you thought about the MaxAllocSize restriction as it concerns
> lts->freeBlocks? Will that be okay when you have many more tapes than
> before?

I added a check. If it exceeds MaxAllocSize, before trying to perform
the allocation, just leak the block rather than adding it to the
freelist. Perhaps there's a usecase for an extraordinarily-long
freelist, but it's outside the scope of this patch.

> LogicalTapeSetExtend() seems to work in a way that assumes that the
> tape is frozen. It would be good to document that assumption, and
> possible enforce it by way of an assertion. The same remark applies
> to
> any other assumptions you're making there.

Can you explain? I am not freezing any tapes in Hash Aggregation, so
what about LogicalTapeSetExtend() assumes the tape is frozen?

Attached new logtape.c patches.

Regards,
Jeff Davis

Attachment Content-Type Size
0001-Logical-Tape-Set-lazily-allocate-read-buffer.patch text/x-patch 1.8 KB
0002-Logical-Tape-Set-change-freelist-to-min-heap.patch text/x-patch 7.3 KB
0003-Logical-Tape-Set-add-API-to-extend-with-additional-t.patch text/x-patch 4.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2020-02-05 18:40:30 Re: Memory-Bounded Hash Aggregation
Previous Message Jeff Davis 2020-02-05 18:26:17 Re: Memory-Bounded Hash Aggregation