Re: Startup process on a hot standby crashes with an error "invalid memory alloc request size 1073741824" while replaying "Standby/LOCK" records

From: Dmitriy Kuzmin <kuzmin(dot)db4(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Startup process on a hot standby crashes with an error "invalid memory alloc request size 1073741824" while replaying "Standby/LOCK" records
Date: 2022-09-06 06:38:43
Message-ID: CAHLDt=9uO-mauy6VGX3jbwNCpD3xKGC225QtH25Pcj4hn4BnKA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Thanks, David!

Let me know if there's any additional information i could provide.

Best regards,
Dmitry Kuzmin

пн, 5 сент. 2022 г. в 22:13, David Rowley <dgrowleyml(at)gmail(dot)com>:

> On Mon, 5 Sept 2022 at 22:38, Dmitriy Kuzmin <kuzmin(dot)db4(at)gmail(dot)com> wrote:
> > One of our clients experienced a crash of startup process with an error
> "invalid memory alloc request size 1073741824" on a hot standby, which
> ended in replica reinit.
> >
> > According to logs, startup process crashed while trying to replay
> "Standby/LOCK" record with a huge list of locks(see attached
> replicalog_tail.tgz):
> >
> > FATAL: XX000: invalid memory alloc request size 1073741824
> > CONTEXT: WAL redo at 7/327F9248 for Standby/LOCK: xid 1638575 db
> 7550635 rel 8500880 xid 1638575 db 7550635 rel 10324499...
> > LOCATION: repalloc, mcxt.c:1075
> > BACKTRACE:
> > postgres: startup recovering
> 000000010000000700000033(repalloc+0x61) [0x8d7611]
> > postgres: startup recovering 000000010000000700000033()
> [0x691c29]
> > postgres: startup recovering 000000010000000700000033()
> [0x691c74]
> > postgres: startup recovering
> 000000010000000700000033(lappend+0x16) [0x691e76]
>
> This must be the repalloc() in enlarge_list(). 1073741824 / 8 is
> 134,217,728 (2^27). That's quite a bit more than 1 lock per your 950k
> tables.
>
> I wonder why the RecoveryLockListsEntry.locks list is getting so long.
>
> from the file you attached, I see:
> $ cat replicalog_tail | grep -Eio "rel\s([0-9]+)" | wc -l
> 950000
>
> So that confirms there were 950k relations in the xl_standby_locks.
> The contents of that message seem to be produced by standby_desc().
> That should be the same WAL record that's processed by standby_redo()
> which adds the 950k locks to the RecoveryLockListsEntry.
>
> I'm not seeing why 950k becomes 134m.
>
> David
>

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Wilco Kruijer 2022-09-06 10:28:59 NULL returned when using RETURNING in main query in combination with a CTE containing FOR UPDATE.
Previous Message Tom Lane 2022-09-05 17:19:53 Re: BUG #17606: There is still some glitch in 3f7323cbb fixing failure of MULTIEXPR_SUBLINK

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2022-09-06 06:41:41 Re: [PATCH] Tab completion for SET COMPRESSION
Previous Message Ibrar Ahmed 2022-09-06 06:37:32 Re: Summary Sort workers Stats in EXPLAIN ANALYZE