Re: Crash during backend start when low on memory

From: Mats Kindahl <mats(at)timescale(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: daniel(at)yesql(dot)se, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Crash during backend start when low on memory
Date: 2022-12-16 08:46:39
Message-ID: CA+14425E_-_nfS=PNXm3Rp7Z5wBmCWVJmOU1Ha_zBZyRytuv5A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, Dec 15, 2022 at 6:20 PM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
wrote:

> On 2022-Dec-14, Mats Kindahl wrote:
>
> > PostmasterContext is set quite early in the procedure so it should not
> be a
> > problem to use it. I think we can use PostmasterContext rather than
> > TopMemoryContext. The memory for the allocations are passed down to
> > PostmasterMain() and used inside InitPostgres() but as far as I can tell,
> > they are not used after the InitPostgres() call. The PostmasterContex is
> > destroyed right after the call to InitPostgres() inside PostmasterMain()
> > and then the rest is allocated in other contexts.
>
> Well, this is what PostgresMain has to say about it:
>
> /*
> * If the PostmasterContext is still around, recycle the space; we
> don't
> * need it anymore after InitPostgres completes. Note this does
> not trash
> * *MyProcPort, because ConnCreate() allocated that space with
> malloc()
> * ... else we'd need to copy the Port data first. Also,
> subsidiary data
> * such as the username isn't lost either; see
> ProcessStartupPacket().
> */
> if (PostmasterContext)
> {
> MemoryContextDelete(PostmasterContext);
>
>
Ugh. Ok. Missed that comment.

> So I think blowing away part of struct Port independently of the
> containing struct is a bad idea. Why isn't more appropriate to do what
> ConnCreate does?
>
> if (!(port = (Port *) calloc(1, sizeof(Port))))
> {
> ereport(LOG,
> (errcode(ERRCODE_OUT_OF_MEMORY),
> errmsg("out of memory")));
> ExitPostmaster(1);
> }
>
> ... ugh. I have to admit that killing postmaster due to a transient
> out-of-memory makes me a bit nervous -- do note that this
> ExitPostmaster() happens before we've forked the backend, so this code
> does bring the server down (!!). If we want to make postmaster more
> robust in the face of OOM, we should also change this code somehow, yes?
>

Seems reasonable, yes. I'll see what I can throw together.

>
> I agree with Mats' premise when starting the thread -- even if the
> server is not in great shape, it doesn't seem like taking the whole
> service down is a great response. It would be better to just refuse the
> connection with some error and let existing connections chug along,
> poorly though they can. Maybe they'll also die soon, freeing memory;
> but that'll be no good if Postmaster is gone altogether.
>

> Now, it would be better if we could test this more thoroughly. Perhaps
> it's possible to use something like the mallocfail.c lib that Heikki
> wrote a few years ago, teaching it to handle calloc
> https://postgr.es/m/547480DE.4040408@vmware.com

Thanks. Will take a look at that.

>
> --
> Álvaro Herrera Breisgau, Deutschland —
> https://www.EnterpriseDB.com/
> "Oh, great altar of passive entertainment, bestow upon me thy discordant
> images
> at such speed as to render linear thought impossible" (Calvin a la TV)
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2022-12-16 14:49:09 Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)
Previous Message Peter Eisentraut 2022-12-16 08:07:36 Re: BUG #17720: pg_dump creates a dump with primary key that cannot be restored, when specifying 'using index ...'