Re: BUG #15804: Assertion failure when using logging_collector with EXEC_BACKEND

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Yuli Khodorkovskiy <yuli(dot)khodorkovskiy(at)crunchydata(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Subject: Re: BUG #15804: Assertion failure when using logging_collector with EXEC_BACKEND
Date: 2019-05-16 01:38:08
Message-ID: 20190516013808.GA1415@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Wed, May 15, 2019 at 10:39:49AM -0400, Tom Lane wrote:
> Hm. I don't much like ignoring errors, which is what that would
> amount to. It seems like in win32_shmem.c, PGSharedMemoryNoReAttach
> is already more or less right: resetting UsedShmemSegAddr to NULL and
> then calling PGSharedMemoryDetach is exactly what to do. It just needs
> to lose the no-longer-correct assertions. Maybe sysv_shmem.c's version
> should look the same?
>
> The real point of PGSharedMemoryNoReAttach, now that I think about it,
> is that it's dealing with a case where we passed down UsedShmemSegAddr
> via the BackendParameters mechanism, but we know that the segment is
> not really attached (because of exec()). This is different from the
> API expectations of PGSharedMemoryDetach, which thinks that non-nullness
> of UsedShmemSegAddr corresponds to whether the segment is attached or
> not. So we can't unify those functions without breaking things, or
> at least losing the ability to distinguish errors from not-errors.

My thoughts were to switch the LOG to DEBUG1 for the code paths where
we do not expect a failure, which is the code path from the
postmaster, so that actually comes closer to your upthread suggestion
of merging both functions. The only thing which changes is that we
would call UnmapViewOfFile(), and fail silently. We don't generate an
ERROR now for the existing code, just a LOG. So there is no actual
hard failure now? Anyway, you are right that it is no time to
redesign this interface with the current timing.

Using vcregress check with logging_collector and assertions enabled
is enough to see the failures on Windows, but the assertions actually
triggered are not the same ones as the EXEC_BACKEND case with sysv
because it's actually the three assertions in
pgwin32_ReserveSharedMemoryRegion() which blow up, not the ones in
PGSharedMemoryNoReAttach() which is able to handle its own. Removing
some assertions are per the attached avoids problems. I am able to
pass check with the logging collector enabled on Windows.

> BTW, another thing I think is worth looking into is exactly why the
> buildfarm failed to detect this problem. According to the code
> coverage report,
>
> https://coverage.postgresql.org/src/backend/postmaster/syslogger.c.gcov.html
>
> we *are* exercising the syslogger at some point during a standard
> test run. Perhaps the Windows animals don't run whatever test
> case that is?

Now from what I can see from the related buildfarm members:
- culicidae uses EXEC_BACKEND on Linux and has assertions, but not
logging_collector.
- dory, hamerkop, woodloose and bowerbird test on Windows with MSVC,
have assertions enabled, but not logging_collector.
- jacana tests on Windows with mingw, has assertions, but it does not
enforce logging_collector.
The Windows animals do not run bincheck.

So my bet is that the coverage is from pg_ctl/t/004_logrotate.pl,
which is a test skipped on Windows. culicidae runs that, so the fact
that the failure got undetected is actually a bit of a mystery because
this uses sysv_shmem.c.

Except culicidae, I am seeing no members using EXEC_BACKEND which
enable the syslogger and use TAP tests :(
--
Michael

Attachment Content-Type Size
shmem-remove-asserts.patch text/x-diff 874 bytes

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Amit Langote 2019-05-16 01:51:05 Re: inconsistent results querying table partitioned by date
Previous Message David Rowley 2019-05-16 01:00:40 Re: inconsistent results querying table partitioned by date