|From:||"Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>|
|To:||'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>|
|Subject:||Re: Re: [bug fix] Cascading standby cannot catch up and get stuck emitting the same message repeatedly|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
> [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Amit Kapila
> I have tried using attached script multiple times on latest 9.2 code, but
> couldn't reproduce the issue. Please find the log attached with this mail.
> Apart from log file, below prints appear:
> WARNING: enabling "trust" authentication for local connections You can
> change this by editing pg_hba.conf or using the option -A, or --auth-local
> and --auth-host, the next time you run initdb.
> 20075/20075 kB (100%), 1/1 tablespace
> NOTICE: pg_stop_backup complete, all required WAL segments have been
> 20079/20079 kB (100%), 1/1 tablespace
> Let me know, if some parameters need to be tweaked to reproduce the issue?
> It seems that the patch proposed is good, but it is better if somebody other
> than you can reproduce the issue and verify if the patch fixes the same.
Thank you for reviewing the code and testing. Hmm, we could reproduce the problem on PostgreSQL 9.2.19. The script's stdout is attached as test.log, and the stderr is as follows:
WARNING: enabling "trust" authentication for local connections You can change this by editing pg_hba.conf or using the option -A, or --auth-local and --auth-host, the next time you run initdb.
20099/20099 kB (100%), 1/1 tablespace
NOTICE: pg_stop_backup complete, all required WAL segments have been archived
20103/20103 kB (100%), 1/1 tablespace
The sizes pg_basebackup outputs is a bit different from yours. I don't see a reason for this. The test script explicitly specifies the database encoding and locale, so the encoding difference doesn't seem to be the cause. The target problem occurs only when a WAL record crosses a WAL segment boundary, so subtle change in WAL record volume would prevent the problem from happening.
Anyway, could you retry with the attached test.sh? It just changes restore_command.
If the problem occurs, the following pair of lines appear in the server log of the cascading standby. Could you check it?
LOG: restored log file "000000020000000000000003" from archive
LOG: out-of-sequence timeline ID 1 (after 2) in log file 0, segment 3, offset 0
|Next Message||Kyotaro HORIGUCHI||2016-11-22 03:35:21||Re: Re: Use procsignal_sigusr1_handler and RecoveryConflictInterrupt() from walsender?|
|Previous Message||Amit Kapila||2016-11-22 02:54:27||Re: [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan|