Re: Strange issues with 9.2 pg_basebackup & replication

From: Thom Brown <thom(at)linux(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Strange issues with 9.2 pg_basebackup & replication
Date: 2012-05-16 16:07:16
Message-ID: CAA-aLv7ur8QKT+2+5QXqyn3yd2i6V9CTzG2gJk9kVhhRpuiZwg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 16 May 2012 11:36, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Wed, May 16, 2012 at 2:29 AM, Thom Brown <thom(at)linux(dot)com> wrote:
>> On 15 May 2012 13:15, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>> On Wed, May 16, 2012 at 1:36 AM, Thom Brown <thom(at)linux(dot)com> wrote:
>>>> However, this isn't true when I restart the standby.  I've been
>>>> informed that this should work fine if a WAL archive has been
>>>> configured (which should be used anyway).
>>>
>>> The WAL archive should be shared by master-replica and replica-replica,
>>> and recovery_target_timeline should be set to latest in replica-replica.
>>> If you configure that way, replica-replica would successfully reconnect to
>>> master-replica with no need to restart it.
>>
>> I had set the archive_command on the primary, then produced a base
>> backup which would have copied the archive settings, but I also added
>> a corresponding recovery_command setting, so everything was pointing
>> at the same archive.
>
> Hmm.. when doing the same, the replica-replica successfully reconnected
> to the master-replica after I shutdown the master-master and promoted the
> master-replica. archive_command is the same in three servers,
> restore_command is the same in two standby servers (i.e., master-replica
> and replica-replica), and recovery_target_timeline is set to 'latest' in two
> standby servers.

I didn't shut down the master-master, but I didn't expect to need to.

I also had recovery_target_timeline set to latest. I also tried
explicitly setting it to the new timeline, and got an error saying
there was no such timeline.

>> But in any case, shouldn't the replication connection be
>> terminated when pg_basebackup is terminated?
>
> +1 To do this, we would need to define SIGINT signal handler and make it
> send QueryCancel packet when Ctrl-C is typed.

Also could we provide some feedback when using the -c spread option,
when there isn't progress within a short period of time? Something
like "Waiting for checkpoint. This can take up to
%checkpoint_timeout%", or something similar, rather than seeing
nothing happening and wondering if something has gone wrong. And also
a note in the documentation saying that, on "quiet" clusters, it may
take some time before the base backup commences. In fact, since
pg_start_backup will exhibit the same behaviour (i.e. no feedback when
waiting for a checkpoint), maybe that should return a notice (if there
are dirty pages) stating that it will complete when the next
checkpoint occurs.

--
Thom

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2012-05-16 16:24:39 Re: Pre-alloc ListCell's optimization
Previous Message Fujii Masao 2012-05-16 15:53:11 Re: Strange issues with 9.2 pg_basebackup & replication