Re: [Patch] ALTER SYSTEM READ ONLY

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amul Sul <sulamul(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Patch] ALTER SYSTEM READ ONLY
Date: 2021-01-19 20:45:03
Message-ID: CA+TgmoZf2by_6kbY0JntGEmrSkj3DxUTNZDXORNasGdsSmkJjA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 14, 2021 at 6:29 AM Amul Sul <sulamul(at)gmail(dot)com> wrote:
> To move development, testing, and the review forward, I have commented out the
> code acquiring CheckpointLock from CreateCheckPoint() in the 0003 patch and
> added the changes for the checkpointer so that system read-write state change
> request can be processed as soon as possible, as suggested by Robert[1].

I am extremely doubtful about SetWALProhibitState()'s claim that "The
final state can only be requested by the checkpointer or by the
single-user so that there will be no chance that the server is already
in the desired final state." It seems like there is an obvious race
condition: CompleteWALProhibitChange() is called with a cur_state_gen
argument which embeds the last state we saw, but there's nothing to
keep it from changing between the time we saw it and the time that
function calls SetWALProhibitState(), is there? We aren't holding any
lock. It seems to me that SetWALProhibitState() needs to be rewritten
to avoid this assumption.

On a related note, SetWALProhibitState() has only two callers. One
passes is_final_state as true, and the other as false: it's never a
variable. The two cases are handled mostly differently. This doesn't
seem good. A lot of the logic in this function should probably be
moved to the calling sites, especially because it's almost certainly
wrong for this function to be basing what it does on the *current* WAL
prohibit state rather than the WAL prohibit state that was in effect
at the time we made the decision to call this function in the first
place. As I mentioned in the previous paragraph, that's a built-in
race condition. To put that another way, this function should NOT feel
free to call GetWALProhibitStateGen().

I don't really see why we should have both an SQL callable function
pg_alter_wal_prohibit_state() and also a DDL command for this. If
we're going to go with a functional interface, and I guess the idea of
that is to make it so GRANT EXECUTE works, then why not just get rid
of the DDL?

RequestWALProhibitChange() doesn't look very nice. It seems like it's
basically the second half of pg_alter_wal_prohibit_state(), not being
called from anywhere else. It doesn't seem to add anything to separate
it out like this; the interface between the two is not especially
clean.

It seems odd that ProcessWALProhibitStateChangeRequest() returns
without doing anything if !AmCheckpointerProcess(), rather than having
that be an Assert(). Why is it like that?

I think WALProhibitStateShmemInit() would probably look more similar
to other functions if it did if (found) { stuff; } rather than if
(!found) return; stuff; -- but I might be wrong about the existing
precedent.

The SetLastCheckPointSkipped() and LastCheckPointIsSkipped() stuff
seems confusingly-named, because we have other reasons for skipping a
checkpoint that are not what we're talking about here. I think this is
talking about whether we've performed a checkpoint after recovery, and
the naming should reflect that. But I think there's something else
wrong with the design, too: why is this protected by a spinlock? I
have questions in both directions. On the one hand, I wonder why we
need any kind of lock at all. On the other hand, if we do need a lock,
I wonder why a spinlock that protects only the setting and clearing of
the flag and nothing else is sufficient. There are zero comments
explaining what the idea behind this locking regime is, and I can't
understand why it should be correct.

In fact, I think this area needs a broader rethink. Like, the way you
integrated that stuff into StartupXLog(), it sure looks to me like we
might skip the checkpoint but still try to write other WAL records.
Before we reach the offending segment of code, we call
UpdateFullPageWrites(). Afterwards, we call XLogReportParameters().
Both of those are going to potentially write WAL. I guess you could
argue that's OK, on the grounds that neither function is necessarily
going to log anything, but I don't think I believe that. If I make my
server read only, take the OS down, change some GUCs, and then start
it again, I don't expect it to PANIC.

Also, I doubt that it's OK to skip the checkpoint as this code does
and then go ahead and execute recovery_end_command and update the
control file anyway. It sure looks like the existing code is written
with the assumption that the checkpoint happens before those other
things. One idea I just had was: suppose that, if the system is READ
ONLY, we don't actually exit recovery right away, and the startup
process doesn't exit. Instead we just sit there and wait for the
system to be made read-write again before doing anything else. But
then if hot_standby=false, there's no way for someone to execute a
ALTER SYSTEM READ WRITE and/or pg_alter_wal_prohibit_state(), which
seems bad. So perhaps we need to let in regular connections *as if*
the system were read-write while postponing not just the
end-of-recovery checkpoint but also the other associated things like
UpdateFullPageWrites(), XLogReportParameters(), recovery_end_command,
control file update, etc. until the end of recovery. Or maybe that's
not the right idea either, but regardless of what we do here it needs
clear comments justifying it. The current version of the patch does
not have any.

I think that you've mis-positioned the check in autovacuum.c. Note
that the comment right afterwards says: "a worker finished, or
postmaster signaled failure to start a worker". Those are things we
should still check for even when the system is R/O. What we don't want
to do in that case is start new workers. I would suggest revising the
comment that starts with "There are some conditions that..." to
mention three conditions. The new one would be that the system is in a
read-only state. I'd mention that first, making the existing ones #2
and #3, and then add the code to "continue;" in that case right after
that comment, before setting current_time.

SendsSignalToCheckpointer() has multiple problems. As far as the name,
it should at least be "Send" rather than "Sends" but the corresponding
functions elsewhere have names like SendPostmasterSignal() not
SendSignalToPostmaster(). Also, why is it OK for it to use elog()
rather than ereport()? Also, why is it an error if the checkpointer's
not running, rather than just having the next checkpointer do it when
it's relaunched? Also, why pass SIGINT as an argument if there's only
one caller? A related thing that's also odd is that sending SIGINT
calls ReqCheckpointHandler() not anything specific to prohibiting WAL.
That is probably OK because that function now just sets the latch. But
then we could stop sending SIGINT to the checkpointer at all and just
send SIGUSR1, which would also set the latch, without using up a
signal. I wonder if we should make that change as a separate
preparatory patch. It seems like that would clear things up; it would
remove the oddity that this patch is invoking a handler called
ReqCheckpointerHandler() with no intention of requesting a checkpoint,
because ReqCheckpointerHandler() would be gone. That problem could
also be fixed by renaming ReqCheckpointerHandler() to something
clearer, but that seems inferior.

This is probably not a complete list of problems. Review from others
would be appreciated.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2021-01-19 20:49:03 Re: Some coverage for DROP OWNED BY with pg_default_acl
Previous Message John Naylor 2021-01-19 20:44:47 Re: WIP: BRIN multi-range indexes