Re: Re: [bug fix] Cascading standby cannot catch up and get stuck emitting the same message repeatedly

From: "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>
To: 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [bug fix] Cascading standby cannot catch up and get stuck emitting the same message repeatedly
Date: 2016-11-22 03:18:50
Message-ID: 0A3221C70F24FB45833433255569204D1F656653@G01JPEXMBYT05
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: pgsql-hackers-owner(at)postgresql(dot)org
> [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Amit Kapila
> I have tried using attached script multiple times on latest 9.2 code, but
> couldn't reproduce the issue. Please find the log attached with this mail.
> Apart from log file, below prints appear:
>
> WARNING: enabling "trust" authentication for local connections You can
> change this by editing pg_hba.conf or using the option -A, or --auth-local
> and --auth-host, the next time you run initdb.
> 20075/20075 kB (100%), 1/1 tablespace
> NOTICE: pg_stop_backup complete, all required WAL segments have been
> archived
> 20079/20079 kB (100%), 1/1 tablespace
>
> Let me know, if some parameters need to be tweaked to reproduce the issue?
>
>
> It seems that the patch proposed is good, but it is better if somebody other
> than you can reproduce the issue and verify if the patch fixes the same.
>

Thank you for reviewing the code and testing. Hmm, we could reproduce the problem on PostgreSQL 9.2.19. The script's stdout is attached as test.log, and the stderr is as follows:

WARNING: enabling "trust" authentication for local connections You can change this by editing pg_hba.conf or using the option -A, or --auth-local and --auth-host, the next time you run initdb.
20099/20099 kB (100%), 1/1 tablespace
NOTICE: pg_stop_backup complete, all required WAL segments have been archived
20103/20103 kB (100%), 1/1 tablespace

The sizes pg_basebackup outputs is a bit different from yours. I don't see a reason for this. The test script explicitly specifies the database encoding and locale, so the encoding difference doesn't seem to be the cause. The target problem occurs only when a WAL record crosses a WAL segment boundary, so subtle change in WAL record volume would prevent the problem from happening.

Anyway, could you retry with the attached test.sh? It just changes restore_command.

If the problem occurs, the following pair of lines appear in the server log of the cascading standby. Could you check it?

LOG: restored log file "000000020000000000000003" from archive
LOG: out-of-sequence timeline ID 1 (after 2) in log file 0, segment 3, offset 0

Regards
Takayuki Tsunakawa

Attachment Content-Type Size
test.sh application/octet-stream 3.3 KB
test.log application/octet-stream 1.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2016-11-22 03:35:21 Re: Re: Use procsignal_sigusr1_handler and RecoveryConflictInterrupt() from walsender?
Previous Message Amit Kapila 2016-11-22 02:54:27 Re: [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan