Re: Startup process on a hot standby crashes with an error "invalid memory alloc request size 1073741824" while replaying "Standby/LOCK" records

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Dmitriy Kuzmin <kuzmin(dot)db4(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Startup process on a hot standby crashes with an error "invalid memory alloc request size 1073741824" while replaying "Standby/LOCK" records
Date: 2022-10-04 22:54:08
Message-ID: 2138765.1664924048@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

[ redirecting to -hackers because patch attached ]

David Rowley <dgrowleyml(at)gmail(dot)com> writes:
> So that confirms there were 950k relations in the xl_standby_locks.
> The contents of that message seem to be produced by standby_desc().
> That should be the same WAL record that's processed by standby_redo()
> which adds the 950k locks to the RecoveryLockListsEntry.

> I'm not seeing why 950k becomes 134m.

I figured out what the problem is. The standby's startup process
retains knowledge of all these locks in standby.c's RecoveryLockLists
data structure, which *has no de-duplication capability*. It'll add
another entry to the per-XID list any time it's told about a given
exclusive lock. And checkpoints cause us to regurgitate the entire
set of currently-held exclusive locks into the WAL. So if you have
a process holding a lot of exclusive locks, and sitting on them
across multiple checkpoints, standby startup processes will bloat.
It's not a true leak, in that we know where the memory is and
we'll release it whenever we see that XID commit/abort. And I doubt
that this is a common usage pattern, which probably explains the
lack of previous complaints. Still, bloat bad.

PFA a quick-hack fix that solves this issue by making per-transaction
subsidiary hash tables. That's overkill perhaps; I'm a little worried
about whether this slows down normal cases more than it's worth.
But we ought to do something about this, because aside from the
duplication aspect the current storage of these lists seems mighty
space-inefficient.

regards, tom lane

Attachment Content-Type Size
fix-RecoveryLockLists-data-structure-1.patch text/x-diff 9.9 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2022-10-04 23:53:11 Re: Startup process on a hot standby crashes with an error "invalid memory alloc request size 1073741824" while replaying "Standby/LOCK" records
Previous Message David G. Johnston 2022-10-03 22:04:27 Re: BUG #17626: Permission denied errors should list role as well as user

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2022-10-04 22:54:20 Re: Move backup-related code to xlogbackup.c/.h
Previous Message Nathan Bossart 2022-10-04 22:32:24 Re: [PATCH] Expand character set for ltree labels