Re: BUG #15641: Autoprewarm worker fails to start on Windows with huge pages in use Old PostgreSQL community/pgsql-bugs x

From: Mithun Cy <mithun(dot)cy(at)gmail(dot)com>
To: Hans Buschmann <buschmann(at)nidsa(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com>, thomas(dot)munro(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: BUG #15641: Autoprewarm worker fails to start on Windows with huge pages in use Old PostgreSQL community/pgsql-bugs x
Date: 2019-03-18 07:04:18
Message-ID: CADq3xVZ4oVE6pS_-Bww6OmiY+WeE96civ3POEqUKe0Oa1fJrpA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

t tOn Mon, Feb 25, 2019 at 12:10 AM Mithun Cy <mithun(dot)cy(at)gmail(dot)com> wrote:

> Thanks Hans, for a simple reproducible tests.
>
> The "worker.bgw_restart_time" is never set for autoprewarm workers so on
> error it get restarted after some period of time (default behavior). Since
> database itself is dropped our attempt to connect to that database failed
> and then worker exited. But again got restated by postmaster then we start
> seeing above DSM segment error.
>
> I think every autoprewarm worker should be set with
> "worker.bgw_restart_time = BGW_NEVER_RESTART;" so that there shall not be
> repeated prewarm attempt of a dropped database. I will try to think further
> and submit a patch for same.
>

Here is the patch for same,

autoprewarm waorker should not be restarted. As per the code
@apw_start_database_worker@
master starts a worker per database and wait until it exit by calling
WaitForBackgroundWorkerShutdown. The call WaitForBackgroundWorkerShutdown
cannot handle the case if the worker was restarted. The
WaitForBackgroundWorkerShutdown() get the status BGWH_STOPPED from the call
GetBackgroundWorkerPid() if worker got restarted. So master will next
detach the shared memory and next restarted worker keep failing going in a
unending loop.

I think there is no need to restart at all. Following are the normal error
we might encounter.
1. Connecting database is droped -- So we need to skip to next database
which master will do by starting a new wroker. So not needed.
2. Relation is droped -- try_relation_open(reloid, AccessShareLock) is used
so error due to dropped relation is handled also avoids concurrent
truncation.
3. smgrexists is used before reading from a fork file. Again error is
handled.
4. before reading the block we have check as below. So previously truncated
pages will not be read again.
/* Check whether blocknum is valid and within fork file size. */
if (blk->blocknum >= nblocks)

I think if any other unexpected errors occurs that should be fatal so
restarting will not be correcting same. Hence there is no need to restart
the per database worker process.

I tried to dig why we did not set it earlier. It used to be never restart,
but it changed after fixing comments [1]. At that time we did not make
explicit database connection per worker and did not handle many error cases
as now. So it appeared fair. But, when code changed to make database
connection per worker, we should have set every worker with
BGW_NEVER_RESTART. Which I think was a mistake.

NOTE : On zero exit status we will not restart the bgworker (see
@CleanupBackgroundWorker@
and @maybe_start_bgworkers@)
[1]
https://www.postgresql.org/message-id/CA%2BTgmoYNF_wfdwQ3z3713zKy2j0Z9C32WJdtKjvRWzeY7JOL4g%40mail.gmail.com
--
Thanks and Regards
Mithun Chicklore Yogendra
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
never_restart_apw_worker_01.patch application/octet-stream 594 bytes

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2019-03-18 09:44:34 BUG #15700: PG 10 vs. 11: Large increase in memory usage when selecting BYTEA data (maybe memory leak)
Previous Message Tomasz Szypowski 2019-03-18 05:01:54 pg_upgrade

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Banck 2019-03-18 07:18:18 Re: Online verification of checksums
Previous Message Stephen Frost 2019-03-18 06:38:10 Re: Online verification of checksums