|From:||Michael Paquier <michael(at)paquier(dot)xyz>|
|To:||"Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>|
|Subject:||Re: [bug fix] Cascaded standby cannot start after a clean shutdown|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
On Thu, Feb 22, 2018 at 04:55:38PM +0900, Michael Paquier wrote:
> I am definitely ready to buy that it can be possible to have garbage
> being read the length field which can cause allocate_recordbuf to fail
> as that's the only code path in xlogreader.c which does such an
> allocation. Still, it seems to me that we should first try to see if
> there are strange allocation patterns that happen and see if it is
> possible to have a reproduceable test case or a pattern which gives us
> confidence that we are on the right track. One idea I have to
> monitor those allocations like the following:
> --- a/src/backend/access/transam/xlogreader.c
> +++ b/src/backend/access/transam/xlogreader.c
> @@ -162,6 +162,10 @@ allocate_recordbuf(XLogReaderState *state, uint32 reclength)
> newSize += XLOG_BLCKSZ - (newSize % XLOG_BLCKSZ);
> newSize = Max(newSize, 5 * Max(BLCKSZ, XLOG_BLCKSZ));
> +#ifndef FRONTEND
> + elog(LOG, "Allocation for xlogreader increased to %u", newSize);
So, I have been playing a bit more with that and defined the following
strategy to see if it is possible to create inconsistencies:
- Use a primary and a standby.
- Set up max_wal_size and min_wal_size to a minimum of 80MB so as the
segment recycling takes effect more quickly.
- Create a single table with a UUID column to increase the likelihood of
random data in INSERT records and FPWs, and insert enough data to
trigger a full WAL recycling.
- Every 5 seconds, insert a set of tuples into the table, using 110 to
120 tuples generates enough data for a bit more than a full WAL page.
And then restart the primary. This causes the standby to catch up with
normally a page streamed which is not completely initialized as it
fetches the page in the middle.
With the monitoring mentioned in the upper comment block, I have let the
whole thing run for a couple of hours, but I have not been able to catch
up problems, except the usual "invalid record length at 0/XXX: wanted
24, got 0". The allocation for recordbuf did not get higher than 40960
bytes as well, which matches with 5 WAL pages.
An other, evil, idea that I have on top of all those things is to
directly hexedit the WAL segment of the standby just at the limit where
it would receive a record from the primary and insert in it garbage
data which would make the validation tests to blow up in xlogreader.c
for the record allocation.
|Next Message||Amit Kapila||2018-02-23 02:29:26||Re: [HACKERS] SERIALIZABLE with parallel query|
|Previous Message||Ahuja, Nitin||2018-02-23 02:13:08||Patch: Pass IndexInfo correctly to aminsert for indexes on TOAST|