Re: src/test/recovery regression failure on bionic

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Christoph Berg <myon(at)debian(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: src/test/recovery regression failure on bionic
Date: 2020-01-08 22:31:06
Message-ID: 1462.1578522666@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

I wrote:
> This would happen if anything is causing the postmaster to have
> a few more open files than the test added by commit
> d207038053837ae9365df2776371632387f6f655 is allowing for. It's
> a test bug and nothing more.
> Why sidewinder is not showing this in HEAD too is an interesting
> question, but it isn't. However, it could be that on another
> platform (ie bionic) the problem does manifest in HEAD.

I set up a NetBSD 7 installation locally, and while I have not
directly reproduced the failure, I believe I understand all the
components of it now.

(1) d20703805's test will clearly fall over if there are more than six
FDs open in the postmaster when set_max_safe_fds is called, because it
sets max_files_per_process = 26 while set_max_safe_fds requires at
least 20 usable FDs to be available.

(2) The postmaster's stdin/stdout/stderr will surely eat up three of
those.

(3) In HEAD, that's actually all the FDs there are normally, but in the
back branches there is one more (under the conditions of this test),
because in the back branches we open the postmaster's listen sockets
before we run set_max_safe_fds. (9a86f03b4 changed this.)

(4) NetBSD 7.0's cron leaves three extra open FDs in processes that
it spawns. I have not looked into why, but I have experimentally
observed this. For example, lsof on a "sleep" launched from cron
shows

COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
sleep 7824 tgl cwd VDIR 0,0 512 795201 /home/tgl
sleep 7824 tgl txt VREG 0,0 10431 1613152 /bin/sleep
sleep 7824 tgl txt VREG 0,0 1616564 22726 /lib/libc.so.12.193.1
sleep 7824 tgl txt VREG 0,0 55295 22747 /lib/libgcc_s.so.1.0
sleep 7824 tgl txt VREG 0,0 187183 22762 /lib/libm.so.0.11
sleep 7824 tgl txt VREG 0,0 92195 1499524 /libexec/ld.elf_so
sleep 7824 tgl 0r PIPE 0xfffffe803131eb58 16384
sleep 7824 tgl 1w PIPE 0xfffffe8007ec4a30 0 ->0xfffffe800cc0d2c0
sleep 7824 tgl 2w PIPE 0xfffffe8007ec4a30 0 ->0xfffffe800cc0d2c0
sleep 7824 tgl 7u unknown file system type: 0
sleep 7824 tgl 8u unknown file system type: 0
sleep 7824 tgl 9w PIPE 0xfffffe80036c4dc0 0

while of course "sleep" launched by hand has only 0/1/2 open.

We may conclude that when the regression tests are launched from cron,
as would be typical for a buildfarm animal, HEAD has exactly zero FDs
to spare in this test, while the back branches are one FD underwater
and fail. This matches the observed results from sidewinder.

It's not clear whether any of this info applies to Christoph's trouble
with bionic. If the extra FDs are an old cron bug, it could be that
bionic shares that bug --- but to explain failure on HEAD, you'd have to
posit four excess FDs not three. I'm not convinced that what Christoph
is seeing matches this anyway; he hasn't showed the telltale
"insufficient file descriptors" message, at least. Still, maybe
launched-by-cron vs launched-by-hand is a relevant point there.

regards, tom lane

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Andres Freund 2020-01-08 23:22:05 Re: src/test/recovery regression failure on bionic
Previous Message Peter Eisentraut 2020-01-08 21:59:13 pgsql: Remove support for Python older than 2.6

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2020-01-08 22:49:06 Re: Removing pg_pltemplate and creating "trustable" extensions
Previous Message Peter Eisentraut 2020-01-08 22:13:49 Re: Recognizing superuser in pg_hba.conf