Re: Check the number of potential synchronous standbys

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: 张文杰 <757634191(at)qq(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Check the number of potential synchronous standbys
Date: 2019-08-26 20:53:25
Message-ID: 17703.1566852805@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"=?gb18030?B?1cXOxL3c?=" <757634191(at)qq(dot)com> writes:
> When the number of potential synchronous standbys is smaller than num_sync, such as 'FIRST 3 (1,2)', 'ANY 4 (1,2,3)' in the synchronous_standby_names, the processes will wait for synchronous replication forever.
> Obviously, it's not expected. I think return false and a error message may be better. And attached is a patch that implements the simple check.

Well, it's not *that* simple; this patch rejects cases like "ANY 2(*)"
which need to be accepted. That causes the src/test/recovery tests
to fail (you should have tried check-world).

I also observe that there's a test case in 007_sync_rep.pl which is
actually exercising the case you want to reject:

# Check that sync_state of each standby is determined correctly
# when num_sync exceeds the number of names of potential sync standbys
# specified in synchronous_standby_names.
test_sync_state(
$node_master, qq(standby1|0|async
standby2|4|sync
standby3|3|sync
standby4|1|sync),
'num_sync exceeds the num of potential sync standbys',
'6(standby4,standby0,standby3,standby2)');

So it can't be said that nobody thought about this at all.

Now, I'm not convinced that this represents a useful use-case as-is.
However, because we can't know how many standbys may match "*",
it's clear that the code has to do something other than just
abort when the situation happens. Conceivably we could fail at
runtime (not GUC parse time) if the number of required standbys
exceeds the number available, rather than waiting indefinitely.
However, if standbys can come online dynamically, a wait involving
"*" might be satisfiable after awhile even if it isn't immediately.

On the whole, given the fuzziness around "*", I'm not sure that
it's easy to make this much better.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-08-26 21:28:57 Re: old_snapshot_threshold vs indexes
Previous Message Tomas Vondra 2019-08-26 20:23:25 Re: subscriptionCheck failures on nightjar