Quick Links

Re: Strange issues with 9.2 pg_basebackup & replication

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Thom Brown <thom(at)linux(dot)com>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Strange issues with 9.2 pg_basebackup & replication
Date:	2012-05-16 15:36:26
Message-ID:	CAHGQGwEVHVy0xBri8hNDvL4GSrE4Cjo+OrURZnWYjVo5SO7FWw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, May 16, 2012 at 2:29 AM, Thom Brown <thom(at)linux(dot)com> wrote:
> On 15 May 2012 13:15, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> On Wed, May 16, 2012 at 1:36 AM, Thom Brown <thom(at)linux(dot)com> wrote:
>>> However, this isn't true when I restart the standby. I've been
>>> informed that this should work fine if a WAL archive has been
>>> configured (which should be used anyway).
>>
>> The WAL archive should be shared by master-replica and replica-replica,
>> and recovery_target_timeline should be set to latest in replica-replica.
>> If you configure that way, replica-replica would successfully reconnect to
>> master-replica with no need to restart it.
>
> I had set the archive_command on the primary, then produced a base
> backup which would have copied the archive settings, but I also added
> a corresponding recovery_command setting, so everything was pointing
> at the same archive.

Hmm.. when doing the same, the replica-replica successfully reconnected
to the master-replica after I shutdown the master-master and promoted the
master-replica. archive_command is the same in three servers,
restore_command is the same in two standby servers (i.e., master-replica
and replica-replica), and recovery_target_timeline is set to 'latest' in two
standby servers.

>>> But one new problem I appear to have is that once I set up archiving
>>> and restart, then try pg_basebackup, it gets stuck and never shows any
>>> progress. If I terminate pg_basebackup in this state and attempt to
>>> restart it more times than max_wal_senders, it can no longer run, as
>>> pg_basebackup didn't disconnect the stream, so ends up using all
>>> senders. And these show up in pg_stat_replication. I have a theory
>>> that if archiving is enabled, restart postgres then generate some WAL
>>> to the point there is a file or two in the archive, pg_basebackup
>>> can't stream anything. Once I restart the server, it's fine and
>>> continues as normal. This has the same symptoms of the "pg_basebackup
>>> from running standby with streaming" issue.
>>
>> This seems to be caused by spread checkpoint which is requested by
>> pg_basebackup. IOW, this looks a normal behavior rather than a bug
>> or an issue. What if you specify "-c fast" option in pg_basebackup?
>
> Yes, it works fine with that option. And it appears this isn't to do
> with there being an archive as I get the same symptoms without setting
> one up.

Yes.

> But in any case, shouldn't the replication connection be
> terminated when pg_basebackup is terminated?

+1 To do this, we would need to define SIGINT signal handler and make it
send QueryCancel packet when Ctrl-C is typed.

Regards,

--
Fujii Masao

In response to

Re: Strange issues with 9.2 pg_basebackup & replication at 2012-05-15 17:29:32 from Thom Brown

Responses

Re: Strange issues with 9.2 pg_basebackup & replication at 2012-05-16 16:07:16 from Thom Brown

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Alvaro Herrera	2012-05-16 15:38:16	Re: read() returns ERANGE in Mac OS X
Previous Message	Hitoshi Harada	2012-05-16 14:23:39	Re: Missing optimization when filters are applied after window functions