[Patch] ALTER SYSTEM READ ONLY

From: amul sul <sulamul(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: [Patch] ALTER SYSTEM READ ONLY
Date: 2020-06-16 13:55:40
Message-ID: CAAJ_b97KZzdJsffwRK7w0XU5HnXkcgKgTR69t8cOZztsyXjkQw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Attached patch proposes $Subject feature which forces the system into
read-only
mode where insert write-ahead log will be prohibited until ALTER SYSTEM READ
WRITE executed.

The high-level goal is to make the availability/scale-out situation
better. The feature
will help HA setup where the master server needs to stop accepting WAL
writes
immediately and kick out any transaction expecting WAL writes at the end,
in case
of network down on master or replication connections failures.

For example, this feature allows for a controlled switchover without
needing to shut
down the master. You can instead make the master read-only, wait until the
standby
catches up, and then promote the standby. The master remains available for
read
queries throughout, and also for WAL streaming, but without the possibility
of any
new write transactions. After switchover is complete, the master can be
shut down
and brought back up as a standby without needing to use pg_rewind.
(Eventually, it
would be nice to be able to make the read-only master into a standby
without having
to restart it, but that is a problem for another patch.)

This might also help in failover scenarios. For example, if you detect that
the master
has lost network connectivity to the standby, you might make it read-only
after 30 s,
and promote the standby after 60 s, so that you never have two writable
masters at
the same time. In this case, there's still some split-brain, but it's still
better than what
we have now.

Design:
----------
The proposed feature is built atop of super barrier mechanism commit[1] to
coordinate
global state changes to all active backends. Backends which executed
ALTER SYSTEM READ { ONLY | WRITE } command places request to checkpointer
process to change the requested WAL read/write state aka WAL prohibited and
WAL
permitted state respectively. When the checkpointer process sees the WAL
prohibit
state change request, it emits a global barrier and waits until all
backends that
participate in the ProcSignal absorbs it. Once it has done the WAL
read/write state in
share memory and control file will be updated so that XLogInsertAllowed()
returns
accordingly.

If there are open transactions that have acquired an XID, the sessions are
killed
before the barrier is absorbed. They can't commit without writing WAL, and
they
can't abort without writing WAL, either, so we must at least abort the
transaction. We
don't necessarily need to kill the session, but it's hard to avoid in all
cases because
(1) if there are subtransactions active, we need to force the top-level
abort record to
be written immediately, but we can't really do that while keeping the
subtransactions
on the transaction stack, and (2) if the session is idle, we also need the
top-level abort
record to be written immediately, but can't send an error to the client
until the next
command is issued without losing wire protocol synchronization. For now, we
just use
FATAL to kill the session; maybe this can be improved in the future.

Open transactions that don't have an XID are not killed, but will get an
ERROR if they
try to acquire an XID later, or if they try to write WAL without acquiring
an XID (e.g. VACUUM).
To make that happen, the patch adds a new coding rule: a critical section
that will write
WAL must be preceded by a call to CheckWALPermitted(),
AssertWALPermitted(), or
AssertWALPermitted_HaveXID(). The latter variants are used when we know for
certain
that inserting WAL here must be OK, either because we have an XID (we would
have
been killed by a change to read-only if one had occurred) or for some other
reason.

The ALTER SYSTEM READ WRITE command can be used to reverse the effects of
ALTER SYSTEM READ ONLY. Both ALTER SYSTEM READ ONLY and ALTER
SYSTEM READ WRITE update not only the shared memory state but also the
control
file, so that changes survive a restart.

The transition between read-write and read-only is a pretty major
transition, so we emit
log message for each successful execution of a ALTER SYSTEM READ {ONLY |
WRITE}
command. Also, we have added a new GUC system_is_read_only which returns
"on"
when the system is in WAL prohibited state or recovery.

Another part of the patch that quite uneasy and need a discussion is that
when the
shutdown in the read-only state we do skip shutdown checkpoint and at a
restart, first
startup recovery will be performed and latter the read-only state will be
restored to
prohibit further WAL write irrespective of recovery checkpoint succeed or
not. The
concern is here if this startup recovery checkpoint wasn't ok, then it will
never happen
even if it's later put back into read-write mode. Thoughts?

Quick demo:
----------------
We have few active sessions, section 1 has performed some writes and stayed
in the
idle state for some time, in between in session 2 where superuser
successfully changed
system state in read-only via ALTER SYSTEM READ ONLY command which kills
session 1. Any other backend who is trying to run write transactions
thereafter will see
a read-only system error.

------------- SESSION 1 -------------
session_1=# BEGIN;
BEGIN
session_1=*# CREATE TABLE foo AS SELECT i FROM generate_series(1,5) i;
SELECT 5

------------- SESSION 2 -------------
session_2=# ALTER SYSTEM READ ONLY;
ALTER SYSTEM

------------- SESSION 1 -------------
session_1=*# COMMIT;
FATAL: system is now read only
HINT: Cannot continue a transaction if it has performed writes while
system is read only.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

------------- SESSION 3 -------------
session_3=# CREATE TABLE foo_bar (i int);
ERROR: cannot execute CREATE TABLE in a read-only transaction

------------- SESSION 4 -------------
session_4=# CHECKPOINT;
ERROR: system is now read only

System can put back to read-write mode by "ALTER SYSTEM READ WRITE" :

------------- SESSION 2 -------------
session_2=# ALTER SYSTEM READ WRITE;
ALTER SYSTEM

------------- SESSION 3 -------------
session_3=# CREATE TABLE foo_bar (i int);
CREATE TABLE

------------- SESSION 4 -------------
session_4=# CHECKPOINT;
CHECKPOINT

TODOs:
-----------
1. Documentation.

Attachments summary:
------------------------------
I tried to split the changes so that it can be easy to read and see the
incremental implementation.

0001: Patch by Robert, to add ability support error in global barrier
absorption.
0002: Patch implement ALTER SYSTEM { READ | WRITE} syntax and psql tab
completion support for it.
0003: A basic implementation where the system can accept $Subject command
and change system to read-only by an emitting barrier.
0004: Patch does the enhancing where the backed execute $Subject command
only and places a request to the checkpointer which is
responsible to change
the state by the emitting barrier. Also, store the state into the
control file to
make It persists across the server restarts.
0005: Patch tightens the check to prevent error in the critical section.
0006: Documentation - WIP

Credit:
-------
The feature is one of the part of Andres Frued's high-level design ideas
for inbuilt
graceful failover for PostgreSQL. Feature implementation design by Robert
Haas.
Initial patch by Amit Khandekar further works and improvement by me under
Robert's
guidance includes this mail writeup as well.

Ref:
----
1] Global barrier commit # 16a4e4aecd47da7a6c4e1ebc20f6dd1a13f9133b

Thank you !

Regards,
Amul Sul
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
v1-0005-Error-or-Assert-before-START_CRIT_SECTION-for-WAL.patch application/octet-stream 59.2 KB
v1-0006-Documentation-WIP.patch application/octet-stream 6.5 KB
v1-0002-Add-alter-system-read-only-write-syntax.patch application/octet-stream 7.4 KB
v1-0003-Implement-ALTER-SYSTEM-READ-ONLY-using-global-bar.patch application/octet-stream 22.4 KB
v1-0001-Allow-error-or-refusal-while-absorbing-barriers.patch application/octet-stream 4.4 KB
v1-0004-Use-checkpointer-to-make-system-READ-ONLY-or-READ.patch application/octet-stream 15.2 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2020-06-16 14:19:17 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Previous Message Tom Lane 2020-06-16 13:53:46 Re: snowball ASCII stemmer configuration