Re: BUG #15641: Autoprewarm worker fails to start on Windows with huge pages in use Old PostgreSQL community/pgsql-bugs x

From: Mithun Cy <mithun(dot)cy(at)gmail(dot)com>
To: Hans Buschmann <buschmann(at)nidsa(dot)net>
Cc: Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com>, thomas(dot)munro(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: BUG #15641: Autoprewarm worker fails to start on Windows with huge pages in use Old PostgreSQL community/pgsql-bugs x
Date: 2019-02-24 18:40:49
Message-ID: CADq3xVY_DjvRMv2DpFugyaW+7ZJAqeEU6vcFPossXTEGSE=toA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Thanks Hans, for a simple reproducible tests.

On Sun, Feb 24, 2019 at 6:54 PM Hans Buschmann <buschmann(at)nidsa(dot)net> wrote:
> Here is the start of the error log:
>
> CPS PRD 2019-02-24 12:11:57 CET 00000 1:> LOG: database system was
interrupted; last known up at 2019-02-17 16:14:05 CET
> CPS PRD 2019-02-24 12:12:16 CET 00000 2:> LOG: entering standby mode
> CPS PRD 2019-02-24 12:12:16 CET 00000 3:> LOG: redo starts at
0/23000028
> CPS PRD 2019-02-24 12:12:16 CET 00000 4:> LOG: consistent recovery
state reached at 0/23000168
> CPS PRD 2019-02-24 12:12:16 CET 00000 5:> LOG: invalid record length
at 0/24000060: wanted 24, got 0
> CPS PRD 2019-02-24 12:12:16 CET 00000 9:> LOG: database system is
ready to accept read only connections
> CPS PRD 2019-02-24 12:12:16 CET 3D000 1:> FATAL: database 16384 does
not exist
> CPS PRD 2019-02-24 12:12:16 CET 00000 10:> LOG: background worker
"autoprewarm worker" (PID 3968) exited with exit code 1
> CPS PRD 2019-02-24 12:12:16 CET 00000 1:> LOG: autoprewarm
successfully prewarmed 0 of 12402 previously-loaded blocks
> CPS PRD 2019-02-24 12:12:17 CET XX000 1:> FATAL: could not connect to
the primary server: FATAL: no pg_hba.conf entry for replication connection
from host "192.168.27.155", user "replicator", SSL off
> CPS PRD 2019-02-24 12:12:17 CET 55000 1:> ERROR: could not map dynamic
shared memory segment

As per the log Auto prewarm master did exit ("autoprewarm successfully
prewarmed 0 of 12402 previously-loaded blocks") first. Then only we started
getting "could not map dynamic shared memory segment".
That is, master has done dsm_detach and then workers started throwing error
after that.

> This seems easy to reproduce:
>
> - Install/create a database with autoprewarm on and pg_prewarm loaded.
> - Fill the autoprewarm cache with some data
> - pg_dump the database
> - drop the database
> - create the database and pg_restore it from the dump
> - start the instance and logs are flooded
>
> I have taken no further investigation in the sourcecode due to limited
skills so far...

I was able to reproduce same.

The "worker.bgw_restart_time" is never set for autoprewarm workers so on
error it get restarted after some period of time (default behavior). Since
database itself is dropped our attempt to connect to that database failed
and then worker exited. But again got restated by postmaster then we start
seeing above DSM segment error.

I think every autoprewarm worker should be set with
"worker.bgw_restart_time = BGW_NEVER_RESTART;" so that there shall not be
repeated prewarm attempt of a dropped database. I will try to think further
and submit a patch for same.

--
Thanks and Regards
Mithun Chicklore Yogendra
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andrew Gierth 2019-02-25 01:57:37 Re: BUG #15648: oracle_fdw extension not able to create
Previous Message Hans Buschmann 2019-02-24 14:04:09 AW: BUG #15641: Autoprewarm worker fails to start on Windows with huge pages in use Old PostgreSQL community/pgsql-bugs x

Browse pgsql-hackers by date

  From Date Subject
Next Message Christophe Pettus 2019-02-24 19:52:54 Re: Remove Deprecated Exclusive Backup Mode
Previous Message Tom Lane 2019-02-24 17:59:21 Re: [Bug Fix] ECPG: could not use set xxx to default statement