Re: [Patch] ALTER SYSTEM READ ONLY

From: Amul Sul <sulamul(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Patch] ALTER SYSTEM READ ONLY
Date: 2020-07-24 13:09:14
Message-ID: CAAJ_b94CTbnHgsrWNnCGMbNsCEWH0fU7QSTVcet3d1POBfefzQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 24, 2020 at 7:34 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,

Thanks for looking at the patch.

>
> > From f0188a48723b1ae7372bcc6a344ed7868fdc40fb Mon Sep 17 00:00:00 2001
> > From: Amul Sul <amul(dot)sul(at)enterprisedb(dot)com>
> > Date: Fri, 27 Mar 2020 05:05:38 -0400
> > Subject: [PATCH v3 2/6] Add alter system read only/write syntax
> >
> > Note that syntax doesn't have any implementation.
> > ---
> > src/backend/nodes/copyfuncs.c | 12 ++++++++++++
> > src/backend/nodes/equalfuncs.c | 9 +++++++++
> > src/backend/parser/gram.y | 13 +++++++++++++
> > src/backend/tcop/utility.c | 20 ++++++++++++++++++++
> > src/bin/psql/tab-complete.c | 6 ++++--
> > src/include/nodes/nodes.h | 1 +
> > src/include/nodes/parsenodes.h | 10 ++++++++++
> > src/tools/pgindent/typedefs.list | 1 +
> > 8 files changed, 70 insertions(+), 2 deletions(-)
>
> Shouldn't there be at outfuncs support as well? Perhaps we even need
> readfuncs, not immediately sure.

Ok, can add that as well.

>
>
>
> > From 2c5db7db70d4cebebf574fbc47db7fbf7c440be1 Mon Sep 17 00:00:00 2001
> > From: Amul Sul <amul(dot)sul(at)enterprisedb(dot)com>
> > Date: Fri, 19 Jun 2020 06:29:36 -0400
> > Subject: [PATCH v3 3/6] Implement ALTER SYSTEM READ ONLY using global barrier.
> >
> > Implementation:
> >
> > 1. When a user tried to change server state to WAL-Prohibited using
> > ALTER SYSTEM READ ONLY command; AlterSystemSetWALProhibitState() will emit
> > PROCSIGNAL_BARRIER_WAL_PROHIBIT_STATE_CHANGE barrier and will wait until the
> > barrier has been absorbed by all the backends.
> >
> > 2. When a backend receives the WAL-Prohibited barrier, at that moment if
> > it is already in a transaction and the transaction already assigned XID,
> > then the backend will be killed by throwing FATAL(XXX: need more discussion
> > on this)
>
> I think we should consider introducing XACTFATAL or such, guaranteeing
> the transaction gets aborted, without requiring a FATAL. This has been
> needed for enough cases that it's worthwhile.
>

As I am aware of, the existing code PostgresMain() uses FATAL to terminate
the connection when protocol synchronization was lost. Currently, in
a proposal, this and another one is "Terminate the idle sessions"[1] is using
FATAL, afaik.

>
> There are several cases where we WAL log without having an xid
> assigned. E.g. when HOT pruning during syscache lookups or such. Are
> there any cases where the check for being in recovery is followed by a
> CHECK_FOR_INTERRUPTS, before the WAL logging is done?
>

In case of operation without xid, an error will be raised just before the point
where the wal record is expected. The places you are asking about, I haven't
found in a glance, will try to search for that, but I am sure current
implementation is not missing those places where it is supposed to check the
prohibited state and complaint.

Quick question, is it possible that pruning will happen with the SELECT query?
It would be helpful if you or someone else could point me to the place where WAL
can be generated even in the case of read-only queries.

>
>
> > 3. Otherwise, if that backend running transaction which yet to get XID
> > assigned we don't need to do anything special, simply call
> > ResetLocalXLogInsertAllowed() so that any future WAL insert in will check
> > XLogInsertAllowed() first which set ready only state appropriately.
> >
> > 4. A new transaction (from existing or new backend) starts as a read-only
> > transaction.
>
> Why do we need 4)? And doesn't that have the potential to be
> unnecessarily problematic if a the server is subsequently brought out of
> the readonly state again?

The transaction that was started in the read-only system state will be read-only
until the end. I think that shouldn't be too problematic.

>
>
> > 5. Auxiliary processes like autovacuum launcher, background writer,
> > checkpointer and walwriter will don't do anything in WAL-Prohibited
> > server state until someone wakes us up. E.g. a backend might later on
> > request us to put the system back to read-write.
>
> Hm. It's not at all clear to me why bgwriter and walwriter shouldn't do
> anything in this state. bgwriter for example is even running entirely
> normally in a hot standby node?

I think I missed to update the description when I reverted the
walwriter changes. The current version doesn't have any changes to
the walwriter. And bgwriter too behaves the same as it on the recovery
system. Will update this, sorry for the confusion.

>
>
> > 6. At shutdown in WAL-Prohibited mode, we'll skip shutdown checkpoint
> > and xlog rotation. Starting up again will perform crash recovery(XXX:
> > need some discussion on this as well)
> >
> > 7. ALTER SYSTEM READ ONLY/WRITE is restricted on standby server.
> >
> > 8. Only super user can toggle WAL-Prohibit state.
> >
> > 9. Add system_is_read_only GUC show the system state -- will true when system
> > is wal prohibited or in recovery.
>
>
>
> > +/*
> > + * AlterSystemSetWALProhibitState
> > + *
> > + * Execute ALTER SYSTEM READ { ONLY | WRITE } statement.
> > + */
> > +void
> > +AlterSystemSetWALProhibitState(AlterSystemWALProhibitState *stmt)
> > +{
> > + if (!superuser())
> > + ereport(ERROR,
> > + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
> > + errmsg("must be superuser to execute ALTER SYSTEM command")));
>
> ISTM we should rather do this in a GRANTable manner. We've worked
> substantially towards that in the last few years.
>

I added this to be inlined with AlterSystemSetConfigFile(), if we want a
GRANTable manner, will try that.

>
>
> >
> > + /*
> > + * WALProhibited indicates if we have stopped allowing WAL writes.
> > + * Protected by info_lck.
> > + */
> > + bool WALProhibited;
> > +
> > /*
> > * SharedHotStandbyActive indicates if we allow hot standby queries to be
> > * run. Protected by info_lck.
> > @@ -7962,6 +7969,25 @@ StartupXLOG(void)
> > RequestCheckpoint(CHECKPOINT_FORCE);
> > }
> >
> > +void
> > +MakeReadOnlyXLOG(void)
> > +{
> > + SpinLockAcquire(&XLogCtl->info_lck);
> > + XLogCtl->WALProhibited = true;
> > + SpinLockRelease(&XLogCtl->info_lck);
> > +}
> > +
> > +/*
> > + * Is the system still in WAL prohibited state?
> > + */
> > +bool
> > +IsWALProhibited(void)
> > +{
> > + volatile XLogCtlData *xlogctl = XLogCtl;
> > +
> > + return xlogctl->WALProhibited;
> > +}
>
> What does this kind of locking achieving? It doesn't protect against
> concurrent ALTER SYSTEM SET READ ONLY or such?
>

The 0004 patch improves that.

>
>
> > + /*
> > + * If the server is in WAL-Prohibited state then don't do anything until
> > + * someone wakes us up. E.g. a backend might later on request us to put
> > + * the system back to read-write.
> > + */
> > + if (IsWALProhibited())
> > + {
> > + (void) WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH, -1,
> > + WAIT_EVENT_CHECKPOINTER_MAIN);
> > + continue;
> > + }
> > +
> > /*
> > * Detect a pending checkpoint request by checking whether the flags
> > * word in shared memory is nonzero. We shouldn't need to acquire the
>
> So if the ASRO happens while a checkpoint, potentially with a
> checkpoint_timeout = 60d, it'll not take effect until the checkpoint has
> finished.
>
> But uh, as far as I can tell, the code would simply continue an
> in-progress checkpoint, despite having absorbed the barrier. And then
> we'd PANIC when doing the XLogInsert()?

I think this might not be the case with the next checkpointer changes in the
0004 patch.

>
> > diff --git a/src/include/access/walprohibit.h b/src/include/access/walprohibit.h
> > new file mode 100644
> > index 00000000000..619c33cd780
> > --- /dev/null
> > +++ b/src/include/access/walprohibit.h
>
> Not sure I like the mix of xlog/wal prefix for pretty closely related
> files... I'm not convinced it's worth having a separate file for this,
> fwiw.

I see.

>
>
>
> > From 5600adc647bd729e4074ecf13e97b9f297e9d5c6 Mon Sep 17 00:00:00 2001
> > From: Amul Sul <amul(dot)sul(at)enterprisedb(dot)com>
> > Date: Fri, 15 May 2020 06:39:43 -0400
> > Subject: [PATCH v3 4/6] Use checkpointer to make system READ-ONLY or
> > READ-WRITE
> >
> > Till the previous commit, the backend used to do this, but now the backend
> > requests checkpointer to do it. Checkpointer, noticing that the current state
> > is has WALPROHIBIT_TRANSITION_IN_PROGRESS flag set, does the barrier request,
> > and then acknowledges back to the backend who requested the state change.
> >
> > Note that this commit also enables ALTER SYSTEM READ WRITE support and make WAL
> > prohibited state persistent across the system restarts.
>
> The split between the previous commit and this commit seems more
> confusing than useful to me.

By looking at the previous two review comments I agree with you. My
intention to make things easier for the reviewer. Will merge this patch
with the previous one.

>
> > +/*
> > + * WALProhibitedRequest: Request checkpointer to make the WALProhibitState to
> > + * read-only.
> > + */
> > +void
> > +WALProhibitRequest(void)
> > +{
> > + /* Must not be called from checkpointer */
> > + Assert(!AmCheckpointerProcess());
> > + Assert(GetWALProhibitState() & WALPROHIBIT_TRANSITION_IN_PROGRESS);
> > +
> > + /*
> > + * If in a standalone backend, just do it ourselves.
> > + */
> > + if (!IsPostmasterEnvironment)
> > + {
> > + performWALProhibitStateChange(GetWALProhibitState());
> > + return;
> > + }
> > +
> > + if (CheckpointerShmem->checkpointer_pid == 0)
> > + elog(ERROR, "checkpointer is not running");
> > +
> > + if (kill(CheckpointerShmem->checkpointer_pid, SIGINT) != 0)
> > + elog(ERROR, "could not signal checkpointer: %m");
> > +
> > + /* Wait for the state to change to read-only */
> > + ConditionVariablePrepareToSleep(&CheckpointerShmem->readonly_cv);
> > + for (;;)
> > + {
> > + /* We'll be done once in-progress flag bit is cleared */
> > + if (!(GetWALProhibitState() & WALPROHIBIT_TRANSITION_IN_PROGRESS))
> > + break;
> > +
> > + elog(DEBUG1, "WALProhibitRequest: Waiting for checkpointer");
> > + ConditionVariableSleep(&CheckpointerShmem->readonly_cv,
> > + WAIT_EVENT_SYSTEM_WALPROHIBIT_STATE_CHANGE);
> > + }
> > + ConditionVariableCancelSleep();
> > + elog(DEBUG1, "Done WALProhibitRequest");
> > +}
>
> Isn't it possible that the system could have been changed back to be
> read-write by the time the wakeup is being processed?

You have a point, the second backend will see the ASRW executed successfully
despite any changes by this. I think it better to have an error for the second
backend instead of silent. Will do the same.

>
> > From 0b7426fc4708cc0e4ad333da3b35e473658bba28 Mon Sep 17 00:00:00 2001
> > From: Amul Sul <amul(dot)sul(at)enterprisedb(dot)com>
> > Date: Tue, 14 Jul 2020 02:10:55 -0400
> > Subject: [PATCH v3 5/6] Error or Assert before START_CRIT_SECTION for WAL
> > write
>
> Isn't that the wrong order? This needs to come before the feature is
> enabled, no?
>

Agreed but, IMHO, let it be, my intention behind the split is to make code read
easy and I don't think they are going to be check-in separately except 0001.

>
>
> > @@ -758,6 +759,9 @@ brinbuildempty(Relation index)
> > ReadBufferExtended(index, INIT_FORKNUM, P_NEW, RBM_NORMAL, NULL);
> > LockBuffer(metabuf, BUFFER_LOCK_EXCLUSIVE);
> >
> > + /* Building indexes will have an XID */
> > + AssertWALPermitted_HaveXID();
> > +
>
> Ugh, that's a pretty ugly naming scheme mix.
>

Ok.

>
>
>
> > @@ -176,6 +177,10 @@ brin_doupdate(Relation idxrel, BlockNumber pagesPerRange,
> > if (((BrinPageFlags(oldpage) & BRIN_EVACUATE_PAGE) == 0) &&
> > brin_can_do_samepage_update(oldbuf, origsz, newsz))
> > {
> > + /* Can reach here from VACUUM, so need not have an XID */
> > + if (RelationNeedsWAL(idxrel))
> > + CheckWALPermitted();
> > +
>
> Hm. Maybe I am confused, but why is that dependent on
> RelationNeedsWAL()? Shouldn't readonly actually mean readonly, even if
> no WAL is emitted?
>

To avoid the unnecessary error for the case where the wal record will not be
generated.

>
> > #include "access/genam.h"
> > #include "access/gist_private.h"
> > #include "access/transam.h"
> > +#include "access/walprohibit.h"
> > #include "commands/vacuum.h"
> > #include "lib/integerset.h"
> > #include "miscadmin.h"
>
> The number of places that now need this new header - pretty much the
> same set of files that do XLogInsert, already requiring an xlog* header
> to be included - drives me further towards the conclusion that it's not
> a good idea to have it separate.
>

Noted.

>
> > extern void ProcessInterrupts(void);
> >
> > +#ifdef USE_ASSERT_CHECKING
> > +typedef enum
> > +{
> > + WALPERMIT_UNCHECKED,
> > + WALPERMIT_CHECKED,
> > + WALPERMIT_CHECKED_AND_USED
> > +} WALPermitCheckState;
> > +
> > +/* in access/walprohibit.c */
> > +extern WALPermitCheckState walpermit_checked_state;
> > +
> > +/*
> > + * Reset walpermit_checked flag when no longer in the critical section.
> > + * Otherwise, marked checked and used.
> > + */
> > +#define RESET_WALPERMIT_CHECKED_STATE() \
> > +do { \
> > + walpermit_checked_state = CritSectionCount ? \
> > + WALPERMIT_CHECKED_AND_USED : WALPERMIT_UNCHECKED; \
> > +} while(0)
> > +#else
> > +#define RESET_WALPERMIT_CHECKED_STATE() ((void) 0)
> > +#endif
> > +
>
> Why are these in headers? And why is this tied to CritSectionCount?
>

If it is too bad we could think to move that. In the critical section, we don't
want the walpermit_checked_state flag to be reset by XLogResetInsertion()
otherwise following XLogBeginInsert() will have an assertion. The idea is that
anything that checks the flag changes it from UNCHECKED to CHECKED.
XLogResetInsertion() sets it to CHECKED_AND_USED if in a critical section and to
UNCHECKED otherwise (i.e. when CritSectionCount == 0).

Regards,
Amul

1] https://postgr.es/m/763A0689-F189-459E-946F-F0EC4458980B@hotmail.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2020-07-24 13:18:52 Re: Default setting for enable_hashagg_disk
Previous Message Amit Kapila 2020-07-24 11:59:11 Re: INSERT INTO SELECT, Why Parallelism is not selected?