Re: [Patch] ALTER SYSTEM READ ONLY

From: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To: Amul Sul <sulamul(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Prabhat Sahu <prabhat(dot)sahu(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: [Patch] ALTER SYSTEM READ ONLY
Date: 2021-09-10 15:42:09
Message-ID: A50E23AD-02B6-4950-BF12-0329B1816E66@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Sep 10, 2021, at 7:36 AM, Amul Sul <sulamul(at)gmail(dot)com> wrote:
>
>> v33-0005
>>
>> This patch makes bool XLogInsertAllowed() more complicated than before. The result used to depend mostly on the value of LocalXLogInsertAllowed except that when that value was negative, the result was determined by RecoveryInProgress(). There was an arcane rule that LocalXLogInsertAllowed must have the non-negative values binary coercible to boolean "true" and "false", with the basis for that rule being the coding of XLogInsertAllowed(). Now that the function is more complicated, this rule seems even more arcane. Can we change the logic to not depend on casting an integer to bool?
>>
>
> We can't use a boolean variable because LocalXLogInsertAllowed
> represents three states as, 1 means "wal is allowed'', 0 for "wal is
> disallowed", and -1 is for "need to check".

I'm complaining that we're using an integer rather than an enum. I'm ok if we define it so that WAL_ALLOWABLE_UNKNOWN = -1, WAL_DISALLOWED = 0, WAL_ALLOWED = 1 or such, but the logic of the function has gotten complicated enough that having to remember which number represents which logical condition has become a (small) mental burden. Given how hard the WAL code is to read and fully grok, I'd rather avoid any unnecessary burden, even small ones.

>> The new function CheckWALPermitted() seems to test the current state of variables but not lock any of them, and the new function comment says:
>>
>
> CheckWALPermitted() calls XLogInsertAllowed() does check the
> LocalXLogInsertAllowed flag which is local to that process only, and
> nobody else reads that concurrently.
>
>> /*
>> * In opposite to the above assertion if a transaction doesn't have valid XID
>> * (e.g. VACUUM) then it won't be killed while changing the system state to WAL
>> * prohibited. Therefore, we need to explicitly error out before entering into
>> * the critical section.
>> */
>>
>> This suggests to me that a vacuum process can check whether wal is prohibited, then begin a critical section which needs wal to be allowed, and concurrently somebody else might disable wal without killing the vacuum process. I'm given to wonder what horrors await when the vacuum process does something that needs to be wal logged but cannot be. Does it trigger a panic? I don't like the idea that calling pg_prohibit_wal durning a vacuum might panic the cluster. If there is some reason this is not a problem, I think the comment should explain it. In particular, why is it sufficient to check whether wal is prohibited before entering the critical section and not necessary to be sure it remains allowed through the lifetime of that critical section?
>>
>
> Hm, interrupts absorption are disabled inside the critical section.
> The wal prohibited state for that process (here vacuum) will never get
> set until it sees the interrupts & the system will not be said wal
> prohibited until every process sees that interrupts. I am not sure we
> should explain the characteristics of the critical section at this
> place, if want, we can add a brief saying that inside the critical
> section we should not worry about the state change which never happens
> because interrupts are disabled there.

I think the fact that interrupts are disabled during critical sections is understood, so there is no need to mention that. The problem is that the method for taking the system read-only is less generally known, and readers of other sections of code need to jump to the definition of CheckWALPermitted to read the comments and understand what it does. Take for example a code stanza from heapam.c:

if (needwal)
CheckWALPermitted();

/* NO EREPORT(ERROR) from here till changes are logged */
START_CRIT_SECTION();

Now, I know that interrupts won't be processed after starting the critical section, but I can see plain as day that an interrupt might get processed *during* CheckWALPermitted, since that function isn't atomic. It might happen after the check is meaningfully finished but before the function actually returns. So I'm not inclined to believe that the way this all works is dependent on interrupts being blocked. So I think, maybe this is all protected by some other scheme. But what? It's not clear from the code comments for CheckWALPermitted, so I'm left having to reverse engineer the system to understand it.

One interpretation is that the signal handler will exit() my backend if it receives a signal saying that the system is going read-only, so there is no race condition. But then why the call to CheckWALPermitted()? If this interpretation were correct, we'd happily enter the critical section without checking, secure in the knowledge that as long as we haven't exited yet, all is ok.

Another interpretation is that the whole thing is just a performance trick. Maybe we're ok with the idea that we will occasionally miss the fact that wal is prohibited, do whatever work we need in the critical section, and then fail later. But if that is true, it had better not be a panic, because designing the system to panic 1% of the time (or whatever percent it works out to be) isn't project style. So looking into the critical section in the heapam.c code, I see:

XLogBeginInsert();
XLogRegisterData((char *) &xlrec, SizeOfHeapInplace);

XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
XLogRegisterBufData(0, (char *) htup + htup->t_hoff, newlen);

And jumping to the definition of XLogBeginInsert() I see

/*
* WAL permission must have checked before entering the critical section.
* Otherwise, WAL prohibited error will force system panic.
*/

So now I'm flummoxed. Is it that the code is broken, or is it that I don't know what the strategy behind all this is? If there were a code comment saying how this all works, I'd be in a better position to either know that it is truly safe or alternately know that the strategy is wrong.

Even if my analysis that this is all flawed is incorrect, I still think that a code comment would help.

>> v33-0007:
>>
>> I don't really like what the documentation has to say about pg_prohibit_wal. Why should pg_prohibit_wal differ from other signal sending functions in whether it returns a boolean? If you believe it must always succeed, you can still define it as returning a boolean and always return true. That leaves the door open to future code changes which might need to return false for some reason.
>>
>
> Ok, I am fine to always return true.

Ok.

>> But I also don't like the idea that existing transactions with xids are immediately killed. Shouldn't this function take an optional timeout, perhaps defaulting to none, but otherwise allowing the user to put the system into WALPROHIBIT_STATE_GOING_READ_ONLY for a period of time before killing remaining transactions?
>>
>
> Ok, will check.
>
>> Why is this function defined to take a boolean such that pg_prohibit_wal(true) means to prohibit wal and pg_prohibit_wal(false) means to allow wal. Wouldn't a different function named pg_allow_wal() make it more clear? This also would be a better interface if taking the system read-only had a timeout as I suggested above, as such a timeout parameter when allowing wal is less clearly useful.
>>
>
> Like Robert, I am too inclined to have a single function that is easy
> to remember.

For C language functions that take a bool argument, I can jump to the definition using ctags, and I assume most other developers can do so using whatever IDE they like. For SQL functions, it's a bit harder to jump to the definition, particularly if you are logged into a production server where non-essential software is intentionally missing. Then you have to wonder, what exactly is the boolean argument toggling here?

I don't feel strongly about this, though, and you don't need to change it.

> Apart from this, recently while testing this patch with
> pgbench where I have exhausted the connection limit and want to change
> the system's prohibited state in between but I was unable to do that,
> I wish I could do that using the pg_clt option. How about having a
> pg_clt option to alter wal prohibited state?

I'd have to review the implementation, but sure, that sounds like a useful ability.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhihong Yu 2021-09-10 15:48:01 Re: a misbehavior of partition row movement (?)
Previous Message Robert Haas 2021-09-10 15:22:18 Re: parallelizing the archiver