Re: BUG #15438: Standby corruption after "Too many open files in system" error

From: Andres Freund <andres(at)anarazel(dot)de>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: juanjo(dot)santamaria(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15438: Standby corruption after "Too many open files in system" error
Date: 2018-10-22 17:57:32
Message-ID: 20181022175732.awdhznheldlwm5ho@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

On 2018-10-22 14:45:04 -0300, Alvaro Herrera wrote:
> On 2018-Oct-18, PG Bug reporting form wrote:
>
> > The primary was working the whole time and recreating the standby replica,
> > after configuring the user limits, seems to solve the issue.

When you say corruption, do you mean that the data was corrupted
afterwards? Or just that the standby spewed errors until you restarted
with adjusted limits?

> Hmm, we recently fixed a bug were file descriptors were being leaked,
> after 10.5 was tagged (commit e00f4b68dc87 [1]) but that was in logical
> decoding, and the symptoms seem completely different anyway. I'm not
> sure it would affect you. The epoll_create stuff in latch.c is pretty
> young though.
>
> [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=e00f4b68dc878dcee46833a742844346daa1e3c8
> [2] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=fa31b6f4e9696f3c9777bf4ec2faea822826ce9f

I'd assume the fact that this hit epoll_create is more owing to the fact
that the file descriptor used for epoll isn't going through fd.c. That
means we don't previously release an fd if we're at our limit. That's
supposed to be catered for by a reserve of fds, but that reserve isn't
that large.

There previously have been discussions whether we can make fd.c handle
fds that need to be created outside of fd.c's remit (by making sure
there's space, and then tracking that similar to OpenTransientFile()).

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2018-10-22 17:57:42 BUG #15450: postgis 2.4 and postgis 2.5 extention not properly built
Previous Message Alvaro Herrera 2018-10-22 17:45:04 Re: BUG #15438: Standby corruption after "Too many open files in system" error