Re: Bug in huge simplehash

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Cc: Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Subject: Re: Bug in huge simplehash
Date: 2021-08-13 04:50:50
Message-ID: CAApHDvp42UGXN+BhZx3gONfFq=OpOTByUX=9pwVrYG8hXEZAow@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 11 Aug 2021 at 00:10, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> wrote:
> Attached v2.

Eyeballing this it looks fine, but I was a little nervous backpatching
without testing it properly, so I ended up provisioning a machine with
256GB and doing a round of testing.

I just created the most simple table I could:

create table a (a bigserial, b int);
and inserted 2^31 rows.

insert into a (b) values(1);
insert into a (b) select b from a; -- repeated until I got 2^31 rows.

set work_mem = '256GB';
set max_parallel_workers_per_gather = 0;

I could recreate the issue described with the following query:

explain (analyze , timing off) select a from a group by a;

After watching perf top for a while it switched to:

98.90% postgres [.] tuplehash_grow
0.36% [kernel] [k] change_p4d_range
0.24% postgres [.] LookupTupleHashEntry
0.09% postgres [.] tts_minimal_store_tuple
0.07% [kernel] [k] vm_normal_page
0.02% [kernel] [k] __softirqentry_text_start
0.02% postgres [.] heap_fill_tuple
0.02% postgres [.] AllocSetAlloc

After patching I got:

explain (analyze , timing off) select a from a group by a;
QUERY PLAN
---------------------------------------------------------------------------------------------------------
HashAggregate (cost=35149810.71..53983243.28 rows=1883343257
width=8) (actual rows=2147483648 loops=1)
Group Key: a
Batches: 1 Memory Usage: 201334801kB
-> Seq Scan on a (cost=0.00..30441452.57 rows=1883343257 width=8)
(actual rows=2147483648 loops=1)
Planning Time: 0.105 ms
Execution Time: 2173480.905 ms
(6 rows)
Time: 2173482.166 ms (36:13.482)

And, since I only had 256GB of memory on this machine and couldn't
really do 2^32 groups, I dropped SH_FILLFACTOR to 0.4 and
SH_MAX_FILLFACTOR to 0.48 and tried again to ensure I got the hash
table full message:

postgres=# explain (analyze on , timing off) select a from a group by a;
ERROR: hash table size exceeded
Time: 1148554.672 ms (19:08.555)

After doing that, I felt a bit better about batch-patching it, so I did.

Thanks for the patch.

David

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2021-08-13 05:14:56 Re: Default to TIMESTAMP WITH TIME ZONE?
Previous Message Amit Kapila 2021-08-13 04:06:35 Re: Skipping logical replication transactions on subscriber side