Quick Links

Re: Startup process on a hot standby crashes with an error "invalid memory alloc request size 1073741824" while replaying "Standby/LOCK" records

From:	David Rowley <dgrowleyml(at)gmail(dot)com>
To:	Dmitriy Kuzmin <kuzmin(dot)db4(at)gmail(dot)com>
Cc:	pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re: Startup process on a hot standby crashes with an error "invalid memory alloc request size 1073741824" while replaying "Standby/LOCK" records
Date:	2022-09-05 12:13:13
Message-ID:	CAApHDvrDg2rJ-sqa7c=wPoHeEGrox46sQ=CFj=FkXqBx26dr0A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs pgsql-hackers

On Mon, 5 Sept 2022 at 22:38, Dmitriy Kuzmin <kuzmin(dot)db4(at)gmail(dot)com> wrote:
> One of our clients experienced a crash of startup process with an error "invalid memory alloc request size 1073741824" on a hot standby, which ended in replica reinit.
>
> According to logs, startup process crashed while trying to replay "Standby/LOCK" record with a huge list of locks(see attached replicalog_tail.tgz):
>
> FATAL: XX000: invalid memory alloc request size 1073741824
> CONTEXT: WAL redo at 7/327F9248 for Standby/LOCK: xid 1638575 db 7550635 rel 8500880 xid 1638575 db 7550635 rel 10324499...
> LOCATION: repalloc, mcxt.c:1075
> BACKTRACE:
> postgres: startup recovering 000000010000000700000033(repalloc+0x61) [0x8d7611]
> postgres: startup recovering 000000010000000700000033() [0x691c29]
> postgres: startup recovering 000000010000000700000033() [0x691c74]
> postgres: startup recovering 000000010000000700000033(lappend+0x16) [0x691e76]

This must be the repalloc() in enlarge_list(). 1073741824 / 8 is
134,217,728 (2^27). That's quite a bit more than 1 lock per your 950k
tables.

I wonder why the RecoveryLockListsEntry.locks list is getting so long.

from the file you attached, I see:
$ cat replicalog_tail | grep -Eio "rel\s([0-9]+)" | wc -l
950000

So that confirms there were 950k relations in the xl_standby_locks.
The contents of that message seem to be produced by standby_desc().
That should be the same WAL record that's processed by standby_redo()
which adds the 950k locks to the RecoveryLockListsEntry.

I'm not seeing why 950k becomes 134m.

David

In response to

Startup process on a hot standby crashes with an error "invalid memory alloc request size 1073741824" while replaying "Standby/LOCK" records at 2022-09-05 10:19:58 from Dmitriy Kuzmin

Responses

Re: Startup process on a hot standby crashes with an error "invalid memory alloc request size 1073741824" while replaying "Standby/LOCK" records at 2022-09-06 06:38:43 from Dmitriy Kuzmin
Re: Startup process on a hot standby crashes with an error "invalid memory alloc request size 1073741824" while replaying "Standby/LOCK" records at 2022-10-04 22:54:08 from Tom Lane

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Richard Guo	2022-09-05 14:25:15	Re: BUG #17606: There is still some glitch in 3f7323cbb fixing failure of MULTIEXPR_SUBLINK
Previous Message	PG Bug reporting form	2022-09-05 11:09:27	BUG #17607: Server process crashes when PLpgSQL function raises error in subtransaction

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Ranier Vilela	2022-09-05 12:17:21	Re: Fix possible bogus array out of bonds (src/backend/access/brin/brin_minmax_multi.c)
Previous Message	Tomas Vondra	2022-09-05 11:54:24	Re: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)