BUG #18349: ERROR: invalid DSA memory alloc request size 1811939328, CONTEXT: parallel worker

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: alexey(dot)ermakov(at)dataegret(dot)com
Subject: BUG #18349: ERROR: invalid DSA memory alloc request size 1811939328, CONTEXT: parallel worker
Date: 2024-02-16 13:18:42
Message-ID: 18349-83d33dd3d0c855c3@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 18349
Logged by: Alexey Ermakov
Email address: alexey(dot)ermakov(at)dataegret(dot)com
PostgreSQL version: 16.2
Operating system: Rocky Linux release 9.3 (Blue Onyx)
Description:

Hello! I found a problem with memory consumption of parallel worker when
postgres 16.2 runs complex analytical query.
After simplifying I was able to reproduce it with following query with just
2 joins:

2024-02-16 12:14:25.669 UTC [425424] ERROR: XX000: invalid DSA memory alloc
request size 1811939328
2024-02-16 12:14:25.669 UTC [425424] LOCATION: dsa_allocate_extended,
dsa.c:690
2024-02-16 12:14:25.669 UTC [425424] STATEMENT: SELECT *
FROM history h
LEFT JOIN coupon cpn ON cpn.recloc = h.recloc AND cpn.version =
h.version
LEFT JOIN free_text i2 ON i2.recloc = cpn.recloc AND i2.part =
1
WHERE h.date_operation >= '2024-02-13' and h.date_operation <
'2024-02-14';
2024-02-16 12:14:25.669 UTC [425424] DEBUG: 00000: shmem_exit(1): 5
before_shmem_exit callbacks to make
2024-02-16 12:14:25.669 UTC [425424] LOCATION: shmem_exit, ipc.c:240

Here is some memory/parallel workers- related settings:
max_parallel_workers = 12
max_parallel_workers_per_gather = 2
hash_mem_multiplier = 2
work_mem = '4MB'

When I increase work_mem to 16MB then query runs successfully (same plan):

Gather (cost=1082271.57..8357367.15 rows=26292632 width=345) (actual
time=50735.451..64000.782 rows=1325161 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=4102227 read=7004436, temp read=1808967
written=1809044
I/O Timings: shared read=37739.143, temp read=4809.969 write=9139.669
-> Parallel Hash Right Join (cost=1081271.57..5727103.95 rows=10955263
width=345) (actual time=50716.477..61271.276 rows=441720 loops=3)
Hash Cond: ((i2.recloc)::text = (cpn.recloc)::text)
Buffers: shared hit=4102227 read=7004436, temp read=1808967
written=1809044
I/O Timings: shared read=37739.143, temp read=4809.969
write=9139.669
-> Parallel Seq Scan on free_text i2 (cost=0.00..4160436.50
rows=74411061 width=63) (actual time=0.196..37944.130 rows=60240671
loops=3)
Filter: (part = 1)
Rows Removed by Filter: 160726692
Buffers: shared hit=158869 read=7004436
I/O Timings: shared read=37739.143
-> Parallel Hash (cost=1077675.12..1077675.12 rows=220524
width=282) (actual time=1406.500..1406.503 rows=209531 loops=3)
Buckets: 131072 Batches: 8 Memory Usage: 30496kB
Buffers: shared hit=3943254, temp written=10316
I/O Timings: temp write=58.043
-> Nested Loop Left Join (cost=1.14..1077675.12 rows=220524
width=282) (actual time=0.048..1239.847 rows=209531 loops=3)
Buffers: shared hit=3943254
-> Parallel Index Scan using history_index_date on
history h (cost=0.57..42000.62 rows=156296 width=90) (actual
time=0.021..76.927 rows=198024 loops=3)
Index Cond: ((date_operation >= '2024-02-13
00:00:00'::timestamp without time zone) AND (date_operation < '2024-02-14
00:00:00'::timestamp without time zone))
Buffers: shared hit=260503
-> Index Scan using coupon_index_rl on coupon cpn
(cost=0.57..6.62 rows=1 width=192) (actual time=0.004..0.005 rows=1
loops=594071)
Index Cond: ((recloc)::text = (h.recloc)::text)
Filter: (version = h.version)
Rows Removed by Filter: 2
Buffers: shared hit=3682751
Planning:
Buffers: shared hit=72
Planning Time: 0.506 ms
Execution Time: 64060.476 ms
(32 rows)

Interesting thing: after query fails it take some time (seconds) before psql
gives control back.
It turns out that postgresql written 1M+ (!) temporary files (most of them
are empty) and cleaning up takes some time.
With bigger work_mem it won't write so many files. Maybe memory issue
somehow related with number of files.

In PostgreSQL 16.2 there were fix for similar bug: [1],[2]. Could it be that
there is some issue with it ?

I wasn't able to make reproducible test case so far but may be given
information is enough...

[1]
https://github.com/postgres/postgres/commit/2a67b5a60ee68892bb028587ddc6de7650822480
[2]
https://www.postgresql.org/message-id/flat/16925-ec96d83529d0d629%40postgresql.org

Thanks,
Alexey Ermakov

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andrei Lepikhov 2024-02-16 14:40:03 Re: BUG #18349: ERROR: invalid DSA memory alloc request size 1811939328, CONTEXT: parallel worker
Previous Message David Rowley 2024-02-16 13:02:20 Re: BUG #18348: Inconsistency with EXTRACT([field] from INTERVAL);