DSA overflow in hash join

From: Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: DSA overflow in hash join
Date: 2025-07-20 11:48:11
Message-ID: 4abe0c31-097e-460e-a159-c8042b63160b@garret.ru
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers!

There is weird error rarely reproduced with sqlancer: `ERROR: invalid
DSA memory alloc request size 1140850688`: **

*-------------------------------------------------------------------------------------------------------------------------*

*

FinalizeAggregate(cost=114075075706156875776.00..114075075706156875776.00rows=1width=8)

->Gather(cost=114075075706156875776.00..114075075706156875776.00rows=4width=8)

WorkersPlanned:4

->PartialAggregate(cost=114075075706156875776.00..114075075706156875776.00rows=1width=8)

->ParallelHashJoin(cost=6510542055.80..97778619693677723648.00rows=6518582404991658491904width=0)

HashCond:((t2_1.c1*t2_1.c1)=t4_1.c1)

->NestedLoop(cost=0.00..57152791114.20rows=2539498720000width=32)

->ParallelSeqScanont2t2_1(cost=0.00..12.20rows=220width=32)

->NestedLoop(cost=0.00..144353654.10rows=11543176000width=0)

->NestedLoop(cost=0.00..63915.85rows=5107600width=0)

->SeqScanont5(cost=0.00..32.60rows=2260width=0)

->Materialize(cost=0.00..43.90rows=2260width=0)

->SeqScanont0(cost=0.00..32.60rows=2260width=0)

->Materialize(cost=0.00..43.90rows=2260width=0)

->SeqScanont0t0_1(cost=0.00..32.60rows=2260width=0)

->ParallelHash(cost=4028895759.80..4028895759.80rows=128343727600width=32)

->NestedLoop(cost=0.00..4028895759.80rows=128343727600width=32)

->NestedLoop(cost=0.00..3569757.80rows=145845145width=0)

->NestedLoopLeftJoin(cost=0.00..7536.20rows=64533width=0)

JoinFilter:(upper((t2.c1+t2.c1)))::boolean

->ParallelSeqScanont2(cost=0.00..12.20rows=220width=32)

->SeqScanont4(cost=0.00..18.80rows=880width=0)

->SeqScanont5t5_1(cost=0.00..32.60rows=2260width=0)

->SeqScanont4t4_1(cost=0.00..18.80rows=880width=32)

The problem is reproduced only with *max_parallel_workers_per_gather=4
The error is produced by `dsa_allocate*` most likely in nodeHash.c
(fortunately there are not so much calls of dsa_allocate)
Certainly there are checks which should prevent allocation of more than
MaxAllocSize:

    max_pointers = Min(max_pointers, MaxAllocSize / sizeof(HashJoinTuple));

But looks like not everywhere.
Particularly I suspect this places

nodeHash.c:1668:

                /* Double the size of the bucket array. */
                pstate->nbuckets *= 2;
                size = pstate->nbuckets * sizeof(dsa_pointer_atomic);
                hashtable->batches[0].shared->size += size / 2;
                dsa_free(hashtable->area,
hashtable->batches[0].shared->buckets);
                hashtable->batches[0].shared->buckets =
                    dsa_allocate(hashtable->area, size);

nodeHash.c:3290:

    batch->buckets =
        dsa_allocate(hashtable->area, sizeof(dsa_pointer_atomic) *
nbuckets);

In first case number of buckets is doubled, in the second case it may be
doubled in ExecChooseHashTableSize:

        /*
         * It's better to use half the batches, so do that and adjust the
         * nbucket in the opposite direction, and double the allowance.
         */
        nbatch /= 2;
        nbuckets *= 2;

May be I missing something, but do we have any protection here from
overflow?

One more notice: as you can see estimation of number of rows in this
case is 6518582404991658491904 (doesn't fit in  64 bits).
It is not difficult to create query with even larger estimation of
number of tuples, for example:

create table t1(pk1 integer, sk1 integer);
create table t2(pk2 integer, sk2 integer);
create table t3(pk3 integer, sk3 integer);
create table t4(pk4 integer, sk4 integer);
create table t5(pk5 integer, sk5 integer);
create table t6(pk6 integer, sk6 integer);
insert into t1 values (generate_series(1,1000000), 0);
insert into t2 values (generate_series(1,1000000), 0);
insert into t3 values (generate_series(1,1000000), 0);
insert into t4 values (generate_series(1,1000000), 0);
insert into t5 values (generate_series(1,1000000), 0);
insert into t6 values (generate_series(1,1000000), 0);
analyze t1;
analyze t2;
analyze t3;
analyze t4;
analyze t5;
analyze t6;
explain select count(*) from t1 join t2 on sk1=sk2 join t3 on sk2=sk3
join t4 on sk3=sk4  join t5 on sk4=sk5 join t6 on sk5=sk6;
         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate
(cost=2994791666679459822091698054365184.00..2994791666679459822091698054365184.00
rows=1 width=8)
   ->  Gather
(cost=2994791666679459822091698054365184.00..2994791666679459822091698054365184.00
rows=2 width=8)
         Workers Planned: 4
         ->  Partial Aggregate
(cost=2994791666679459822091698054365184.00..2994791666679459822091698054365184.00
rows=1 width=8)
               ->  Parallel Hash Join
(cost=9195963541715053182976.00..1953125000012793188098301096886272.00
rows=416666666666666672044102856701116416 width=0)
                     Hash Cond: (t1.sk1 = t3.sk3)
                     ->  Merge Join (cost=180939.82..6250190523.16
rows=416666666667 width=8)
                           Merge Cond: (t1.sk1 = t2.sk2)
                           ->  Sort  (cost=53182.48..54224.15
rows=416667 width=4)
                                 Sort Key: t1.sk1
                                 ->  Parallel Seq Scan on t1
(cost=0.00..8591.67 rows=416667 width=4)
                           ->  Materialize (cost=127757.34..132757.34
rows=1000000 width=4)
                                 ->  Sort (cost=127757.34..130257.34
rows=1000000 width=4)
                                       Sort Key: t2.sk2
                                       ->  Seq Scan on t2
(cost=0.00..14425.00 rows=1000000 width=4)
                     ->  Parallel Hash
(cost=1953125000038385975296.00..1953125000038385975296.00
rows=416666666666666659676160 width=16)
                           ->  Parallel Hash Join
(cost=23086308963.32..1953125000038385975296.00
rows=416666666666666659676160 width=16)
                                 Hash Cond: (t3.sk3 = t5.sk5)
                                 ->  Merge Join
(cost=180939.82..6250190523.16 rows=416666666667 width=8)
                                       Merge Cond: (t3.sk3 = t4.sk4)
                                       ->  Sort
(cost=53182.48..54224.15 rows=416667 width=4)
                                             Sort Key: t3.sk3
                                             ->  Parallel Seq Scan on
t3  (cost=0.00..8591.67 rows=416667 width=4)
                                       ->  Materialize
(cost=127757.34..132757.34 rows=1000000 width=4)
                                             ->  Sort
(cost=127757.34..130257.34 rows=1000000 width=4)
                                                   Sort Key: t4.sk4
                                                   ->  Seq Scan on t4 
(cost=0.00..14425.00 rows=1000000 width=4)
                                 ->  Parallel Hash
(cost=6250190523.16..6250190523.16 rows=416666666667 width=8)
                                       ->  Merge Join
(cost=180939.82..6250190523.16 rows=416666666667 width=8)
                                             Merge Cond: (t5.sk5 = t6.sk6)
                                             ->  Sort
(cost=53182.48..54224.15 rows=416667 width=4)
                                                   Sort Key: t5.sk5
                                                   ->  Parallel Seq
Scan on t5  (cost=0.00..8591.67 rows=416667 width=4)
                                             ->  Materialize
(cost=127757.34..132757.34 rows=1000000 width=4)
                                                   ->  Sort
(cost=127757.34..130257.34 rows=1000000 width=4)
                                                         Sort Key: t6.sk6
                                                         -> Seq Scan on
t6  (cost=0.00..14425.00 rows=1000000 width=4)

Please notice that this query doesn't cause alloc error.
But I wonder if we should limit maxima; number of estimated rows,
because it seems to be clear that none system can proceed 20^25 rows in
some reasonable amount of time. So if we have such estimation that
either there is some error in estimations and fortunately the query is
finished much faster, either it is not finished at all. But from
resource allocation point of view there is no sense in trying to
allocate resources for 10^25 rows.

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael J. Baars 2025-07-20 12:41:37 Re: Upgrade from Fedora 40 to Fedora 42, or from PostgreSQL 16.3 to PostgreSQL 16.9
Previous Message feichanghong 2025-07-20 08:49:16 Re: Even when the data is already ordered, MergeAppend still adds a Sort node