Re: Startup process on a hot standby crashes with an error "invalid memory alloc request size 1073741824" while replaying "Standby/LOCK" records

From: Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, nathandbossart(at)gmail(dot)com, dgrowleyml(at)gmail(dot)com, kuzmin(dot)db4(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Startup process on a hot standby crashes with an error "invalid memory alloc request size 1073741824" while replaying "Standby/LOCK" records
Date: 2022-10-10 12:24:34
Message-ID: CANbhV-HYn8GgDheVa94EG9g8EtAmjjoYzWC6_r6Hd0mv=EphKg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Wed, 5 Oct 2022 at 16:30, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> writes:
> > At Tue, 4 Oct 2022 17:15:31 -0700, Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote in
> >> On Tue, Oct 04, 2022 at 07:53:11PM -0400, Tom Lane wrote:
> >>> After further thought, maybe it'd be better to do it as attached,
> >>> with one long-lived hash table for all the locks.
>
> > First one is straight forward outcome from the current implement but I
> > like the new one. I agree that it is natural and that the expected
> > overhead per (typical) transaction is lower than both the first one
> > and doing the same operation on a list. I don't think that space
> > inefficiency in that extent doesn't matter since it is the startup
> > process.
>
> To get some hard numbers about this, I made a quick hack to collect
> getrusage() numbers for the startup process (patch attached for
> documentation purposes). I then ran the recovery/t/027_stream_regress.pl
> test a few times and collected the stats (also attached). This seems
> like a reasonably decent baseline test, since the core regression tests
> certainly take lots of AccessExclusiveLocks what with all the DDL
> involved, though they shouldn't ever take large numbers at once. Also
> they don't run long enough for any lock list bloat to occur, so these
> numbers don't reflect a case where the patches would provide benefit.
>
> If you look hard, there's maybe about a 1% user-CPU penalty for patch 2,
> although that's well below the run-to-run variation so it's hard to be
> sure that it's real. The same comments apply to the max resident size
> stats. So I'm comforted that there's not a significant penalty here.
>
> I'll go ahead with patch 2 if there's not objection.

Happy to see this change.

> One other point to discuss: should we consider back-patching? I've
> got mixed feelings about that myself. I don't think that cases where
> this helps significantly are at all mainstream, so I'm kind of leaning
> to "patch HEAD only".

It looks fine to eventually backpatch, since StandbyReleaseLockTree()
was optimized to only be called when the transaction had actually done
some AccessExclusiveLocks.

So the performance loss is minor and isolated to the users of such
locks, so I see no problems with it.

--
Simon Riggs http://www.EnterpriseDB.com/

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2022-10-10 17:14:34 BUG #17631: Prepare + Merge fails to identify parameter types
Previous Message David Rowley 2022-10-10 04:21:02 Re: Default framing option RANGE adds cost for no gain to some window functions

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-10-10 12:28:09 Re: create subscription - improved warning message
Previous Message Robert Haas 2022-10-10 12:10:22 Re: problems with making relfilenodes 56-bits