hashagg slowdown due to spill changes

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject: hashagg slowdown due to spill changes
Date: 2020-06-06 04:11:46
Message-ID: 20200606041146.slqfg7cuptx27tuy@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

While preparing my pgcon talk I noticed that our hash-agg performance
degraded noticeably. Looks to me like it's due to the spilling-hashagg
changes.

Sample benchmark:

config:
-c huge_pages=on -c shared_buffers=32GB -c jit=0 -c max_parallel_workers_per_gather=0
(largely just to reduce variance)

data prep:
CREATE TABLE fewgroups_many_rows AS SELECT (random() * 4)::int cat, (random()*10000)::int val FROM generate_series(1, 100000000);
VACUUM (FREEZE, ANALYZE) fewgroups_many_rows;

test prep:
CREATE EXTENSION IF NOT EXISTS pg_prewarm;SELECT pg_prewarm('fewgroups_many_rows', 'buffer');

test:
SELECT cat, count(*) FROM fewgroups_many_rows GROUP BY 1;

Using best-of-three timing:

12 12221.031 ms
master 13855.129 ms

While not the end of the world, that's a definitely noticable and
reproducible slowdown (~12%).

I don't think this is actually an inherent cost, but a question of how
the code ended up being organized. Here's a perf diff of profiles for
both versions:

# Baseline Delta Abs Shared Object Symbol
# ........ ......... ................ .........................................
#
+6.70% postgres [.] LookupTupleHashEntryHash
+6.37% postgres [.] prepare_hash_slot
+4.74% postgres [.] TupleHashTableHash_internal.isra.0
20.36% -2.89% postgres [.] ExecInterpExpr
6.31% -2.73% postgres [.] lookup_hash_entries
+2.36% postgres [.] lookup_hash_entry
+2.14% postgres [.] ExecJustAssignScanVar
2.28% +1.97% postgres [.] ExecScan
2.54% +1.93% postgres [.] MemoryContextReset
3.84% -1.86% postgres [.] SeqNext
10.19% -1.50% postgres [.] tts_buffer_heap_getsomeattrs
+1.42% postgres [.] hash_bytes_uint32
+1.39% postgres [.] TupleHashTableHash
+1.10% postgres [.] tts_virtual_clear
3.36% -0.74% postgres [.] ExecAgg
+0.45% postgres [.] CheckForSerializableConflictOutNeeded
0.25% +0.44% postgres [.] hashint4
5.80% -0.35% postgres [.] tts_minimal_getsomeattrs
1.91% -0.33% postgres [.] heap_getnextslot
4.86% -0.32% postgres [.] heapgettup_pagemode
1.46% -0.32% postgres [.] tts_minimal_clear

While some of this is likely is just noise, it's pretty clear that we
spend a substantial amount of additional time below
lookup_hash_entries().

And looking at the code, I'm not too surprised:

Before there was basically one call from nodeAgg.c to execGrouping.c for
each tuple and hash table. Now it's a lot more complicated:
1) nodeAgg.c: prepare_hash_slot()
2) execGrouping.c: TupleHashTableHash()
3) nodeAgg.c: lookup_hash_entry()
4) execGrouping.c: LookupTupleHashEntryHash()

For each of these data needs to be peeled out of one or more of AggState
/ AggStatePerHashData / TupleHashTable. There's no way the compiler can
know that nothing inside those changes, therefore it has to reload the
contents repeatedly. By my look at the profiles, that's where most of
the time is going.

There's also the issue that the signalling whether to insert / not to
insert got unnecessarily complicated. There's several checks:
1) lookup_hash_entry() (p_isnew = aggstate->hash_spill_mode ? NULL : &isnew;)
2) LookupTupleHashEntry_internal() (if (isnew))
3) lookup_hash_entry() (if (entry == NULL) and if (isnew))
4) lookup_hash_entries() if (!in_hash_table)

Not performance related: I am a bit confused why the new per-hash stuff
in lookup_hash_entries() isn't in lookup_hash_entry()? I assume that's
because of agg_refill_hash_table()?

Why isn't the flow more like this:
1) prepare_hash_slot()
2) if (aggstate->hash_spill_mode) goto 3; else goto 4
3) entry = LookupTupleHashEntry(&hash); if (!entry) hashagg_spill_tuple();
4) InsertTupleHashEntry(&hash, &isnew); if (isnew) initialize(entry)

That way there's again exactly one call to execGrouping.c, there's no
need for nodeAgg to separately compute the hash, there's far fewer
branches...

Doing things this way might perhaps make agg_refill_hash_table() a tiny
bit more complicated, but it'll also avoid the slowdown for the vast
majority of cases where we're not spilling.

Greetings,

Andres Freund

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2020-06-06 04:30:04 Re: libpq copy error handling busted
Previous Message Andres Freund 2020-06-06 03:32:32 Re: Atomic operations within spinlocks