RE: Re:BUG #17392: archiver process exited with exit code 2 was unexpectedly cause for immediate shutdown request

From: Улаев Александр Сергеевич <alexander(dot)ulaev(at)rtlabs(dot)ru>
To: Sergei Kornilov <sk(at)zsrv(dot)org>, PG Bug reporting form <noreply(at)postgresql(dot)org>
Cc: "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: RE: Re:BUG #17392: archiver process exited with exit code 2 was unexpectedly cause for immediate shutdown request
Date: 2022-02-03 09:54:19
Message-ID: 779598bff1e7430f9e032d1b62b8006b@rtlabs.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi, thanks for your cooperation

2022-02-01 16:11:34,964 INFO: no action. I am (db02.local) the leader with the lock
2022-02-01 16:11:44,950 INFO: no action. I am (db02.local) the leader with the lock
2022-02-01 16:11:54,677 INFO: Lock owner: db02.local; I am db02.local
2022-02-01 16:11:56,145 ERROR: Request to server http://IP_ETCD1:2379 failed: MaxRetryError('HTTPConnectionPool(host=\'IP_ETCD1\', port=2379): Max retries exceeded with url: /v2/keys/postgresql/db/leader (Caused by Read
TimeoutError("HTTPConnectionPool(host=\'IP_ETCD1\', port=2379): Read timed out. (read timeout=1.4283105651182788)",))',)
2022-02-01 16:11:56,146 INFO: Reconnection allowed, looking for another server.
2022-02-01 16:11:56,146 INFO: Retrying on http://IP_ETCD2:2379
2022-02-01 16:11:56,262 INFO: Selected new etcd server http://IP_ETCD2:2379
2022-02-01 16:11:56,475 INFO: no action. I am (db02.local) the leader with the lock
2022-02-01 16:12:04,950 INFO: no action. I am (db02.local) the leader with the lock
2022-02-01 16:12:14,960 INFO: no action. I am (db02.local) the leader with the lock
2022-02-01 16:12:24,669 INFO: Lock owner: db02.local; I am db02.local
2022-02-01 16:12:24,738 ERROR:
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/patroni/dcs/etcd.py", line 562, in wrapper
retval = func(self, *args, **kwargs) is not None
File "/usr/lib/python3.6/site-packages/patroni/dcs/etcd.py", line 692, in _update_leader
return self.retry(self._client.write, self.leader_path, self._name, prevValue=self._name, ttl=self._ttl)
File "/usr/lib/python3.6/site-packages/patroni/dcs/etcd.py", line 443, in retry
return retry(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/patroni/utils.py", line 334, in __call__
return func(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/etcd/client.py", line 500, in write
response = self.api_execute(path, method, params=params)
File "/usr/lib/python3.6/site-packages/patroni/dcs/etcd.py", line 256, in api_execute
return self._handle_server_response(response)
File "/usr/lib/python3.6/site-packages/etcd/client.py", line 987, in _handle_server_response
etcd.EtcdError.handle(r)
File "/usr/lib/python3.6/site-packages/etcd/__init__.py", line 306, in handle
raise exc(msg, payload)
etcd.EtcdKeyNotFound: Key not found : /postgresql/db/leader
2022-02-01 16:12:24,928 ERROR: failed to update leader lock
2022-02-01 16:12:27,063 INFO: demoted self because failed to update leader lock in DCS
2022-02-01 16:12:27,079 INFO: closed patroni connection to the postgresql cluster
2022-02-01 16:12:28,685 INFO: postmaster pid=153397
2022-02-01 16:12:34,670 INFO: Lock owner: None; I am db02.local
2022-02-01 16:12:34,671 INFO: not healthy enough for leader race
2022-02-01 16:12:34,738 INFO: starting after demotion in progress
2022-02-01 16:12:44,672 INFO: Lock owner: None; I am db02.local
2022-02-01 16:12:44,672 INFO: not healthy enough for leader race
2022-02-01 16:12:44,741 INFO: starting after demotion in progress
2022-02-01 16:12:54,672 INFO: Lock owner: None; I am db02.local
2022-02-01 16:12:54,672 INFO: not healthy enough for leader race
2022-02-01 16:12:54,739 INFO: starting after demotion in progress
2022-02-01 16:13:04,672 INFO: Lock owner: None; I am db02.local
2022-02-01 16:13:04,672 INFO: not healthy enough for leader race
2022-02-01 16:13:04,746 INFO: starting after demotion in progress
2022-02-01 16:13:14,673 INFO: Lock owner: None; I am db02.local
2022-02-01 16:13:14,674 INFO: not healthy enough for leader race
2022-02-01 16:13:14,743 INFO: starting after demotion in progress
2022-02-01 16:13:18,748 INFO: establishing a new patroni connection to the postgres cluster
2022-02-01 16:13:19,961 INFO: Got response from db01.local http://IP_DB01:8008/patroni: {"state": "starting", "role": "replica", "database_system_identifier": "9879384539485923423", "patroni": {"version": "2.1.0", "scope": "db"}}
2022-02-01 16:13:19,961 WARNING: Request failed to db01.local: GET http://IP_DB01:8008/patroni ('xlog')
2022-02-01 16:13:20,365 INFO: promoted self to leader by acquiring session lock
2022-02-01 16:13:20,380 INFO: cleared rewind state after becoming the leader
2022-02-01 16:13:21,816 INFO: no action. I am (db02.local) the leader with the lock
2022-02-01 16:13:24,830 INFO: no action. I am (db02.local) the leader with the lock
2022-02-01 16:13:34,826 INFO: no action. I am (db02.local) the leader with the lock

Best regards,
Ulaev Alexander

-----Original Message-----
From: Sergei Kornilov [mailto:sk(at)zsrv(dot)org]
Sent: Thursday, February 3, 2022 11:52 AM
To: PG Bug reporting form <noreply(at)postgresql(dot)org>
Cc: Улаев Александр Сергеевич <alexander(dot)ulaev(at)rtlabs(dot)ru>; pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re:BUG #17392: archiver process exited with exit code 2 was unexpectedly cause for immediate shutdown request

Hello

> 2022-02-01 16:12:24 MSK [16959] LOG: received immediate shutdown request

Please prove by patroni logs, that not patroni commanded immediate shutdown. As far as I remember, there should be a mark about the request for stopping the database immediatly.

Seems unrelated to archiver. Patroni is able to command immediate shutdown, immediate shutdown is a kind of "terminated abnormally". The sequence of logs looks completely correct.

regards, Sergei

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Sergei Kornilov 2022-02-03 10:06:11 Re:BUG #17392: archiver process exited with exit code 2 was unexpectedly cause for immediate shutdown request
Previous Message Juan José Santamaría Flecha 2022-02-03 09:03:04 Re: Postgresql error : PANIC: could not locate a valid checkpoint record