Quick Links

Re: hashagg slowdown due to spill changes

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: hashagg slowdown due to spill changes
Date:	2020-06-08 20:41:29
Message-ID:	07d6cae97a18c603bed9cf609d16448eb2b438a5.camel@j-davis.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, 2020-06-05 at 21:11 -0700, Andres Freund wrote:
> Before there was basically one call from nodeAgg.c to execGrouping.c
> for
> each tuple and hash table. Now it's a lot more complicated:
> 1) nodeAgg.c: prepare_hash_slot()
> 2) execGrouping.c: TupleHashTableHash()
> 3) nodeAgg.c: lookup_hash_entry()
> 4) execGrouping.c: LookupTupleHashEntryHash()

The reason that I did it that way was to be able to store the hash
along with the saved tuple (similar to what HashJoin does), which
avoids recalculation.

That could be a nice savings for some cases, like when work_mem is
small but the data still fits in system memory, which I expect to be
fairly common. But based on your numbers, it might be a bad trade-off
overall.

> Why isn't the flow more like this:
> 1) prepare_hash_slot()
> 2) if (aggstate->hash_spill_mode) goto 3; else goto 4
> 3) entry = LookupTupleHashEntry(&hash); if (!entry)
> hashagg_spill_tuple();
> 4) InsertTupleHashEntry(&hash, &isnew); if (isnew) initialize(entry)

I'll work up a patch to refactor this. I'd still like to see if we can
preserve the calculate-hash-once behavior somehow.

Regards,
Jeff Davis

In response to

hashagg slowdown due to spill changes at 2020-06-06 04:11:46 from Andres Freund

Responses

Re: hashagg slowdown due to spill changes at 2020-06-08 21:08:19 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Eisentraut	2020-06-08 20:48:01	Re: snowball release
Previous Message	Peter Geoghegan	2020-06-08 20:20:53	Re: Bump default wal_level to logical