Re: pgsql: Improve handling of parameter differences in physical replicatio

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Cc: Peter Eisentraut <peter(at)eisentraut(dot)org>, pgsql-committers(at)lists(dot)postgresql(dot)org
Subject: Re: pgsql: Improve handling of parameter differences in physical replicatio
Date: 2020-03-31 01:16:03
Message-ID: b8802e12-85d3-4784-142d-24671c9eebfd@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On 2020/03/31 3:10, Andres Freund wrote:
> Hi,
>
> On 2020-03-30 19:41:43 +0900, Fujii Masao wrote:
>> On 2020/03/30 16:58, Peter Eisentraut wrote:
>>> Improve handling of parameter differences in physical replication
>>>
>>> When certain parameters are changed on a physical replication primary,
>>> this is communicated to standbys using the XLOG_PARAMETER_CHANGE WAL
>>> record. The standby then checks whether its own settings are at least
>>> as big as the ones on the primary. If not, the standby shuts down
>>> with a fatal error.
>>>
>>> The correspondence of settings between primary and standby is required
>>> because those settings influence certain shared memory sizings that
>>> are required for processing WAL records that the primary might send.
>>> For example, if the primary sends a prepared transaction, the standby
>>> must have had max_prepared_transaction set appropriately or it won't
>>> be able to process those WAL records.
>>>
>>> However, fatally shutting down the standby immediately upon receipt of
>>> the parameter change record might be a bit of an overreaction. The
>>> resources related to those settings are not required immediately at
>>> that point, and might never be required if the activity on the primary
>>> does not exhaust all those resources. If we just let the standby roll
>>> on with recovery, it will eventually produce an appropriate error when
>>> those resources are used.
>>>
>>> So this patch relaxes this a bit. Upon receipt of
>>> XLOG_PARAMETER_CHANGE, we still check the settings but only issue a
>>> warning and set a global flag if there is a problem. Then when we
>>> actually hit the resource issue and the flag was set, we issue another
>>> warning message with relevant information.
>
> I find it somewhat hostile that we don't display the actual resource
> error once the problem is hit - we just pause. Sure, there's going to be
> some previous log entry explaining what the actual parameter difference
> is - but that could have been weeks ago. So either hard to find, or even
> rotated out.
>
>
>>> At that point we pause recovery, so a hot standby remains usable.
>>> We also repeat the last warning message once a minute so it is
>>> harder to miss or ignore.
>
>
> I can't really imagine that the adjustments made in this patch are
> sufficient.
>
> One important issue seems to me to be the size of the array that
> TransactionIdIsInProgress() allocates:
> /*
> * If first time through, get workspace to remember main XIDs in. We
> * malloc it permanently to avoid repeated palloc/pfree overhead.
> */
> if (xids == NULL)
> {
> /*
> * In hot standby mode, reserve enough space to hold all xids in the
> * known-assigned list. If we later finish recovery, we no longer need
> * the bigger array, but we don't bother to shrink it.
> */
> int maxxids = RecoveryInProgress() ? TOTAL_MAX_CACHED_SUBXIDS : arrayP->maxProcs;
>
> xids = (TransactionId *) malloc(maxxids * sizeof(TransactionId));
> if (xids == NULL)
> ereport(ERROR,
> (errcode(ERRCODE_OUT_OF_MEMORY),
> errmsg("out of memory")));
> }
>
> Which I think means we'll just overrun the xids array in some cases,
> e.g. if KnownAssignedXids overflowed. Obviously we should have a
> crosscheck in the code (which we don't), but it was previously a
> supposedly unreachable path.
>
> Similarly, the allocation in GetSnapshotData() will be too small, I
> think:
> if (snapshot->xip == NULL)
> {
> /*
> * First call for this snapshot. Snapshot is same size whether or not
> * we are in recovery, see later comments.
> */
> snapshot->xip = (TransactionId *)
> malloc(GetMaxSnapshotXidCount() * sizeof(TransactionId));
> if (snapshot->xip == NULL)
> ereport(ERROR,
> (errcode(ERRCODE_OUT_OF_MEMORY),
> errmsg("out of memory")));
> Assert(snapshot->subxip == NULL);
> snapshot->subxip = (TransactionId *)
> malloc(GetMaxSnapshotSubxidCount() * sizeof(TransactionId));
> if (snapshot->subxip == NULL)
> ereport(ERROR,
> (errcode(ERRCODE_OUT_OF_MEMORY),
> errmsg("out of memory")));
> }
>
> I think basically all code using TOTAL_MAX_CACHED_SUBXIDS,
> GetMaxSnapshotSubxidCount(), PROCARRAY_MAXPROCS needs to be reviewed
> much more carefully than done here.
>
>
> Also, shouldn't dynahash be adjusted as well? There's e.g. the
> following HASH_ENTER path:
> /* report a generic message */
> if (hashp->isshared)
> ereport(ERROR,
> (errcode(ERRCODE_OUT_OF_MEMORY),
> errmsg("out of shared memory")));
>
>
> I'm also not sure it's ok to not have the waiting in
> ProcArrayAdd(). Is it guaranteed that can't be hit now, due to the WAL
> replay path sometimes adding procs?
>
>
> How is it safe to just block in StandbyParamErrorPauseRecovery() before
> raising an error? The error would have released lwlocks, but now we
> don't? If we e.g. block in PrepareRedoAdd() we'll continue to hold
> TwoPhaseStateLock(), but shutting the database down might also acquire
> that (CheckPointTwoPhase()). Similar with the error in
> KnownAssignedXidsAdd().
>
>
> This does not seem ready.
>
>
>> I encountered the trouble maybe related to this commit.
>>
>> Firstly I set up the master and the standby with max_connections=100 (default value).
>> Then I decreased max_connections to 1 only in the standby and restarted
>> the server. Thanks to the commit, I saw the following warning message
>> in the standby.
>>
>> WARNING: insufficient setting for parameter max_connections
>> DETAIL: max_connections = 1 is a lower setting than on the master server (where its value was 100).
>> HINT: Change parameters and restart the server, or there may be resource exhaustion errors sooner or later.
>>
>> Then I made the script that inserted 1,000,000 rows in one transaction,
>> and ran it 30 times at the same time. That is, 30 transactions inserting
>> lots of rows were running at the same time.
>>
>> I confirmed that there are expected number of rows in the master,
>> but found 0 row in the standby unxpectedly.
>
> Have the relevant records actually been replicated?

Yeah, I thought I confirmed that, but when I tried to reproduce
the issue in the clean env, I failed to do that. So which means
that I did mistake... :( Sorry for noise..

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

In response to

Browse pgsql-committers by date

  From Date Subject
Next Message Tom Lane 2020-03-31 01:45:15 Re: pgsql: Attempt to fix unstable regression tests, take 2
Previous Message Peter Geoghegan 2020-03-31 00:35:00 pgsql: Further simplify nbtree high key truncation.

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2020-03-31 01:31:46 Re: BUG #16109: Postgres planning time is high across version (Expose buffer usage during planning in EXPLAIN)
Previous Message Tomas Vondra 2020-03-31 01:14:07 Re: [PATCH] Incremental sort (was: PoC: Partial sort)