From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> |
Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Resetting crash time of background worker |
Date: | 2015-03-17 13:42:26 |
Message-ID: | CA+TgmoYyV1Xf+D86KUev_buUrPFdY3UxePtiZ+ijSbQ-mwzoUA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Mar 17, 2015 at 1:33 AM, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
> When the postmaster recovers from a backend or worker crash, it resets bg
> worker's crash time (rw->rw_crashed_at) so that the bgworker will
> immediately restart (ResetBackgroundWorkerCrashTimes).
>
> But resetting rw->rw_crashed_at to 0 means that we have lost the information
> that the bgworker had actuallly crashed. So later when postmaster tries to
> find any workers that should start (maybe_start_bgworker), it treats this
> worker as a new worker, as against treating it as one that had crashed and
> is to be restarted. So for this bgworker, it does not consider
> BGW_NEVER_RESTART :
>
> if (rw->rw_crashed_at != 0) { if (rw->rw_worker.bgw_restart_time ==
> BGW_NEVER_RESTART) { ForgetBackgroundWorker(&iter); continue; } .... ....
> That means, it will not remove the worker, and it will be restarted. Now if
> the worker again crashes, postmaster would keep on repeating the crash and
> restart cycle for the whole system.
>
> From what I understand, BGW_NEVER_RESTART applies even to a crashed server.
> But let me know if I am missing anything.
>
> I think we either have to retain the knowledge that the worker has crashed
> using some new field, or else, we should reset the crash time only if it is
> not flagged BGW_NEVER_RESTART.
I think you're right, and I think we should do the second of those.
Thanks for tracking this down.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2015-03-17 13:48:23 | Re: assessing parallel-safety |
Previous Message | Jackson Isaac | 2015-03-17 12:49:01 | GSoC 2015 Idea Discussion |