Resetting crash time of background worker

From: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Resetting crash time of background worker
Date: 2015-03-17 05:33:02
Message-ID: CAJ3gD9fWj6EO+7v=bYZ2osqvPuWx=Lnf-g-BUac6NQ2YzSXqog@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

When the postmaster recovers from a backend or worker crash, it resets bg
worker's crash time (rw->rw_crashed_at) so that the bgworker will
immediately restart (ResetBackgroundWorkerCrashTimes).

But resetting rw->rw_crashed_at to 0 means that we have lost the
information that the bgworker had actuallly crashed. So later when
postmaster tries to find any workers that should start
(maybe_start_bgworker), it treats this worker as a new worker, as against
treating it as one that had crashed and is to be restarted. So for this
bgworker, it does not consider BGW_NEVER_RESTART :

if (rw->rw_crashed_at != 0) { if (rw->rw_worker.bgw_restart_time ==
BGW_NEVER_RESTART) { ForgetBackgroundWorker(&iter); continue; } .... ....
That means, it will not remove the worker, and it will be restarted. Now if
the worker again crashes, postmaster would keep on repeating the crash and
restart cycle for the whole system.

From what I understand, BGW_NEVER_RESTART applies even to a crashed server.
But let me know if I am missing anything.

I think we either have to retain the knowledge that the worker has crashed
using some new field, or else, we should reset the crash time only if it is
not flagged BGW_NEVER_RESTART.

-Amit Khandekar

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2015-03-17 05:42:14 Re: Parallel Seq Scan
Previous Message Pavel Stehule 2015-03-17 05:11:19 Re: [PATCH] Add transforms feature