| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
|---|---|
| To: | weijie1006jl(at)gmail(dot)com |
| Cc: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
| Subject: | Re: BUG #19345: MemoryContextSizeFailure after upgrade 14.11 to 17.7 in stored procedure |
| Date: | 2025-12-06 21:04:48 |
| Message-ID: | 2390310.1765055088@sss.pgh.pa.us |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-bugs |
PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
> After upgrading from PostgreSQL 14.11 to PostgreSQL 17.7 on our production
> environment (RockyLinux8, RPM package), we encountered a crash when
> executing a specific stored procedure.
> This procedure ran successfully in 14.11 (approximately 8 hours execution
> time), but on 17.7 it runs for about 6 hours before failing with:
> LOCATION: MemoryContextSizeFailure, mcxt.c:1169
> [3717604]BACKTRACE:
> postgres: postgres003: md pgdb_md 10.21.18.87(53748) CALL()
> [0x50d5e0]
> postgres: postgres003: md pgdb_md 10.21.18.87(53748) CALL()
> [0x9c96dc]
> postgres: postgres003: md pgdb_md 10.21.18.87(53748)
> CALL(MemoryContextAllocZero+0x14) [0x9cfb54]
> postgres: postgres003: md pgdb_md 10.21.18.87(53748)
> CALL(ResourceOwnerEnlarge+0x9f) [0x9d218f]
> postgres: postgres003: md pgdb_md 10.21.18.87(53748)
> CALL(OpenTemporaryFile+0x64) [0x83dfc4]
> postgres: postgres003: md pgdb_md 10.21.18.87(53748)
> CALL(BufFileCreateTemp+0x18) [0x83a288]
> postgres: postgres003: md pgdb_md 10.21.18.87(53748)
> CALL(ExecHashJoinSaveTuple+0x68) [0x6e1ce8]
Hmm. Apparently, your hash join tried to use so many temporary files
that it needed a more-than-1GB array just to track them all. One could
guess that it had been in swap hell for some hours before reaching
this point, because that'd imply about a terabyte worth of I/O
buffers, never mind the actual hashtable data.
Too-many-batches (resulting in too-many-temp-files) is a known failure
mechanism for our hash join code. We've tried to improve that in v18
(cf commits a1b4f289b, aa151022e), but I don't think there's any
appetite for back-patching that work into older branches.
> Observations / Workarounds Tested:
> SET enable_hashjoin = off; → procedure runs successfully.
> SET hash_mem_multiplier = 1; → procedure runs successfully.
I'm curious whether the change of hash_mem_multiplier causes it
to not use a hash join, or that is just successful at limiting
the hash table growth to a point short of failure.
regards, tom lane
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Dean Rasheed | 2025-12-06 22:53:38 | Re: BUG #19340: Wrong result from CORR() function |
| Previous Message | Tom Lane | 2025-12-06 18:18:43 | Re: BUG #19340: Wrong result from CORR() function |