Re: [Patch] ALTER SYSTEM READ ONLY

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Soumyadeep Chakraborty <soumyadeep2007(at)gmail(dot)com>
Cc: Prabhat Sahu <prabhat(dot)sahu(at)enterprisedb(dot)com>, Amul Sul <sulamul(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Patch] ALTER SYSTEM READ ONLY
Date: 2020-07-24 14:31:49
Message-ID: CA+TgmoZ-c3Dz9QwHwmm4bc36N4u0XZ2OyENewMf+BwokbYdK9Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 22, 2020 at 6:03 PM Soumyadeep Chakraborty
<soumyadeep2007(at)gmail(dot)com> wrote:
> So if we are not going to address those cases, we should change the
> syntax and remove the notion of read-only. It could be:
>
> ALTER SYSTEM SET wal_writes TO off|on;
> or
> ALTER SYSTEM SET prohibit_wal TO off|on;

This doesn't really work because of the considerations mentioned in
http://postgr.es/m/CA+TgmoakCtzOZr0XEqaLFiMBcjE2rGcBAzf4EybpXjtNetpSVw@mail.gmail.com

> If we are going to try to make it truly read-only, and cater to the
> other use cases, we have to:
>
> Perform a checkpoint before declaring the system read-only (i.e. before
> the command returns). This may be expensive of course, as Andres has
> pointed out in this thread, but it is a price that has to be paid. If we
> do this checkpoint, then we can avoid an additional shutdown checkpoint
> and an end-of-recovery checkpoint (if we restart the primary after a
> crash while in read-only mode). Also, we would have to prevent any
> operation that touches control files, which I am not sure we do today in
> the current patch.

It's basically impossible to create a system for fast failover that
involves a checkpoint. See my comments at
http://postgr.es/m/CA+TgmoYe8uCgtYFGfnv3vWpZTygsdkSu2F4MNiqhkar_UKbWfQ@mail.gmail.com
- you can't achieve five nines or even four nines of availability if
you have to wait for a checkpoint that might take twenty minutes. I
have nothing against a feature that does what you're describing, but
this feature is designed to make fast failover easier to accomplish,
and it's not going to succeed if it involves a checkpoint.

> Why not have the best of both worlds? Consider:
>
> ALTER SYSTEM SET read_only to {off, on, wal};
>
> -- on: wal writes off + no writes to disk
> -- off: default
> -- wal: only wal writes off
>
> Of course, there can probably be better syntax for the above.

There are a few things you can can imagine doing here:

1. Freeze WAL writes but allow dirty buffers to be flushed afterward.
This is the most useful thing for fast failover, I would argue,
because it's quick and the fact that some dirty buffers may not be
written doesn't matter.

2. Freeze WAL writes except a final checkpoint which will flush dirty
buffers along the way. This is like shutting the system down cleanly
and bringing it back up as a standby, except without performing a
shutdown.

3. Freeze WAL writes and write out all dirty buffers without actually
checkpointing. This is sort of a hybrid of #1 and #2. It's probably
not much faster than #2 but it avoids generating any more WAL.

4. Freeze WAL writes and just keep all the dirty buffers cached,
without writing them out. This seems like a bad idea for the reasons
mentioned in Amul's reply. The system might not be able to respond
even to read-only queries any more if shared_buffers is full of
unevictable dirty buffers.

Either #2 or #3 is sufficient to take a filesystem level snapshot of
the cluster while it's running, but I'm not sure why that's
interesting. You can already do that sort of thing by using
pg_basebackup or by running pg_start_backup() and pg_stop_backup() and
copying the directory in the middle, and you can do all of that while
the cluster is accepting writes, which seems like it will usually be
more convenient. If you do want this, you have several options, like
running a checkpoint immediately followed by ALTER SYSTEM READ ONLY
(so that the amount of WAL generated during the backup is small but
maybe not none); or shutting down the system cleanly and restarting it
as a standby; or maybe using the proposed pg_ctl demote feature
mentioned on a separate thread.

Contrary to what you write, I don't think either #2 or #3 is
sufficient to enable checksums, at least not without some more
engineering, because the server would cache the state from the control
file, and a bunch of blocks from the database. I guess it would work
if you did a server restart afterward, but I think there are better
ways of supporting online checksum enabling that don't require
shutting down the server, or even making it read-only; and there's
been significant work done on those already.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2020-07-24 14:33:55 Re: [Patch] ALTER SYSTEM READ ONLY
Previous Message Chris Travers 2020-07-24 14:17:36 Re: Making CASE error handling less surprising