Re: Resetting crash time of background worker

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Resetting crash time of background worker
Date: 2015-03-17 13:42:26
Message-ID: CA+TgmoYyV1Xf+D86KUev_buUrPFdY3UxePtiZ+ijSbQ-mwzoUA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 17, 2015 at 1:33 AM, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
> When the postmaster recovers from a backend or worker crash, it resets bg
> worker's crash time (rw->rw_crashed_at) so that the bgworker will
> immediately restart (ResetBackgroundWorkerCrashTimes).
>
> But resetting rw->rw_crashed_at to 0 means that we have lost the information
> that the bgworker had actuallly crashed. So later when postmaster tries to
> find any workers that should start (maybe_start_bgworker), it treats this
> worker as a new worker, as against treating it as one that had crashed and
> is to be restarted. So for this bgworker, it does not consider
> BGW_NEVER_RESTART :
>
> if (rw->rw_crashed_at != 0) { if (rw->rw_worker.bgw_restart_time ==
> BGW_NEVER_RESTART) { ForgetBackgroundWorker(&iter); continue; } .... ....
> That means, it will not remove the worker, and it will be restarted. Now if
> the worker again crashes, postmaster would keep on repeating the crash and
> restart cycle for the whole system.
>
> From what I understand, BGW_NEVER_RESTART applies even to a crashed server.
> But let me know if I am missing anything.
>
> I think we either have to retain the knowledge that the worker has crashed
> using some new field, or else, we should reset the crash time only if it is
> not flagged BGW_NEVER_RESTART.

I think you're right, and I think we should do the second of those.
Thanks for tracking this down.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-03-17 13:48:23 Re: assessing parallel-safety
Previous Message Jackson Isaac 2015-03-17 12:49:01 GSoC 2015 Idea Discussion