[PATCH] Fix for infinite signal loop in parallel scan

From: Chris Travers <chris(dot)travers(at)adjust(dot)com>
To: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: [PATCH] Fix for infinite signal loop in parallel scan
Date: 2018-09-07 15:57:05
Message-ID: CAN-RpxB-oeZve_J3SM_6=HXPmvEG=HX+9V9pi8g2YR7YW0rBBg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi;

Attached is the patch we are fully testing at Adjust. There are a few
non-obvious aspects of the code around where the patch hits. I have run
make check on Linux and MacOS, and make check-world on Linux (check-world
fails on MacOS on all versions and all branches due to ecpg failures).
Automatic tests are difficult because it is to a rare race condition which
is difficult (or possibly impossible) to automatically create. Our current
approach testing is to force the issue using a debugger. Any ideas on how
to reproduce automatically are appreciated but even on our heavily loaded
systems this seems to be a very small portion of queries that hit this case
(meaning the issue happens about once a week for us).

The main problem is that under certain rare circumstances we see recovery
conflicts going into loops sending sigusr1 to the parallel query which
retries a posix_falloc call. The call gets interrupted by the signal
perpetually and the replication cannot proceed.

The patch attempts to handle cases where we are trying to cancel the query
or terminate the process in the same way we would handle an ENOSPC in the
resize operation, namely to break off the loop, do relevant cleanup, and
then throw relevant exceptions.

There is a very serious complication in this area, however, which is
that dsm_impl_op
takes an elevel parameter which determines whether or not it is safe to
throw errors. This elevel is then passed to ereport inside the function,
and this function also calls the resize operation itself. Since this is
not safe with CHECK_FOR_INTERRUPTS(), I conditionally do this only if
elevel is greater or equal to ERROR.

Currently all codepaths here use elevel ERROR when reaching this path but
given that the calling function supports semantics where this could be
lower, and where a caller might have additional cleanup to do, I don't
think one can just add CHECK_FOR_INTERRUPTS() there even though that would
work for now since this could create all kinds of subtle bugs in the future.

So what the patch does is check for whether we are trying to end the query
or the backend and does not retry the resize operation if either of those
conditions are true. In those cases it will set errno to EINTR and return
the same.

The only caller then, if the resize operation failed, checks for an elevel
greater or equal to ERROR, and whether the errno is set to EINTR. If so it
checks for signals. If these do not abort the query, we then fall through
and pass the ereport with the supplied elevel as we would have otherwise,
and return false to the caller.

In current calls to this function, this means that interrupts are handled
right after cleanup of the dynamic shared memory and these then abort the
query or exit the backend. Future calls could specify a WARNING elevel if
they have extra cleanup to do, in which case signals would not be checked,
and the same warning found today would found in the log. At the next
CHECK_FOR_INTERRUPTS() call, the appropriate errors would be raised etc.

In the past it had been suggested we do PG_TRY(); and PG_CATCH(), but given
that it is not consistent whether we can raise an error or whether we MUST
raise an error, I don't see how this approach can work. As far as I can
see, we MUST raise an error in the appropriate spot if and only if elevel
is set to a sufficient level.

Backporting this is pretty trivial. We expect to confirm this ourselves on
both master and 10. I can prepare back ports fairly quickly.

Is there any feedback on this approach before I add it to the next
commitfest?

--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin

Attachment Content-Type Size
racecond.patch application/octet-stream 2.7 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Maksim Milyutin 2018-09-07 15:59:34 Re: Hint to set owner for tablespace directory
Previous Message Fabien COELHO 2018-09-07 15:14:41 Re: Removing useless \. at the end of copy in pgbench