Re: Crash during backend start when low on memory

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Mats Kindahl <mats(at)timescale(dot)com>
Cc: daniel(at)yesql(dot)se, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Crash during backend start when low on memory
Date: 2022-12-15 17:20:08
Message-ID: 20221215172008.islvqfyysom3ufxy@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 2022-Dec-14, Mats Kindahl wrote:

> PostmasterContext is set quite early in the procedure so it should not be a
> problem to use it. I think we can use PostmasterContext rather than
> TopMemoryContext. The memory for the allocations are passed down to
> PostmasterMain() and used inside InitPostgres() but as far as I can tell,
> they are not used after the InitPostgres() call. The PostmasterContex is
> destroyed right after the call to InitPostgres() inside PostmasterMain()
> and then the rest is allocated in other contexts.

Well, this is what PostgresMain has to say about it:

/*
* If the PostmasterContext is still around, recycle the space; we don't
* need it anymore after InitPostgres completes. Note this does not trash
* *MyProcPort, because ConnCreate() allocated that space with malloc()
* ... else we'd need to copy the Port data first. Also, subsidiary data
* such as the username isn't lost either; see ProcessStartupPacket().
*/
if (PostmasterContext)
{
MemoryContextDelete(PostmasterContext);

So I think blowing away part of struct Port independently of the
containing struct is a bad idea. Why isn't more appropriate to do what
ConnCreate does?

if (!(port = (Port *) calloc(1, sizeof(Port))))
{
ereport(LOG,
(errcode(ERRCODE_OUT_OF_MEMORY),
errmsg("out of memory")));
ExitPostmaster(1);
}

... ugh. I have to admit that killing postmaster due to a transient
out-of-memory makes me a bit nervous -- do note that this
ExitPostmaster() happens before we've forked the backend, so this code
does bring the server down (!!). If we want to make postmaster more
robust in the face of OOM, we should also change this code somehow, yes?

I agree with Mats' premise when starting the thread -- even if the
server is not in great shape, it doesn't seem like taking the whole
service down is a great response. It would be better to just refuse the
connection with some error and let existing connections chug along,
poorly though they can. Maybe they'll also die soon, freeing memory;
but that'll be no good if Postmaster is gone altogether.

Now, it would be better if we could test this more thoroughly. Perhaps
it's possible to use something like the mallocfail.c lib that Heikki
wrote a few years ago, teaching it to handle calloc
https://postgr.es/m/547480DE.4040408@vmware.com

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"Oh, great altar of passive entertainment, bestow upon me thy discordant images
at such speed as to render linear thought impossible" (Calvin a la TV)

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2022-12-15 18:56:30 Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)
Previous Message reiner peterke 2022-12-15 17:02:26 Re: BUG #17720: pg_dump creates a dump with primary key that cannot be restored, when specifying 'using index ...'