Re: BUG #14230: Wrong timeline returned by pg_stop_backup on a standby

From: Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>
To: francesco(dot)canovai(at)2ndquadrant(dot)it, pgsql-bugs(at)postgresql(dot)org, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BUG #14230: Wrong timeline returned by pg_stop_backup on a standby
Date: 2016-07-06 15:57:34
Message-ID: bdf05251-65d3-5847-671f-50a7cc3aa64b@2ndquadrant.it
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On 06/07/16 17:41, Marco Nenciarini wrote:
> On 06/07/16 17:37, Marco Nenciarini wrote:
>> Hi,
>>
>> On 06/07/16 17:07, francesco(dot)canovai(at)2ndquadrant(dot)it wrote:
>>> The following bug has been logged on the website:
>>>
>>> Bug reference: 14230
>>> Logged by: Francesco Canovai
>>> Email address: francesco(dot)canovai(at)2ndquadrant(dot)it
>>> PostgreSQL version: 9.6beta2
>>> Operating system: Linux
>>> Description:
>>>
>>> I'm taking a concurrent backup from a standby in PostgreSQL beta2 and I get
>>> the wrong timeline from pg_stop_backup(false).
>>>
>>> This is what I'm doing:
>>>
>>> 1) I set up an environment with a primary server and a replica in streaming
>>> replication.
>>>
>>> 2) On the replica, I run
>>>
>>> postgres=# SELECT pg_start_backup('test_backup', true, false);
>>> pg_start_backup
>>> -----------------
>>> 0/3000A00
>>> (1 row)
>>>
>>> 3) When I run pg_stop_backup, it returns a start wal location belonging to a
>>> file with timeline 0.
>>>
>>> postgres=# SELECT pg_stop_backup(false);
>>> pg_stop_backup
>>>
>>> ---------------------------------------------------------------------------
>>> (0/3000AE0,"START WAL LOCATION: 0/3000A00 (file
>>> 000000000000000000000003)+
>>> CHECKPOINT LOCATION: 0/3000A38
>>> +
>>> BACKUP METHOD: streamed
>>> +
>>> BACKUP FROM: standby
>>> +
>>> START TIME: 2016-07-06 16:44:31 CEST
>>> +
>>> LABEL: test_backup
>>> +
>>> ","")
>>> (1 row)
>>>
>>> The timeline returned is fine (is 1) when running the same commands on the
>>> master.
>>>
>>> An incorrect backup label doesn't prevent PostgreSQL from starting up, but
>>> it affects the tools using that information.
>>>
>>>
>>
>> The issue here is that the do_pg_stop_backup function uses the
>> ThisTimeLineID variable that is not valid on standbys.
>>
>> I think that it should read it from
>> ControlFile->checkPointCopy.ThisTimeLineID as we do in do_pg_start_backup.
>>
>
> No, that's not the solution.
>
> The backup_label is generated during the do_pg_start_backup call, so
> also the copy in ControlFile->checkPointCopy.ThisTimeLineID is
> uninitialized.
>

After further analysis, the issue is that we retrieve the starttli from
the ControlFile structure, but it was using ThisTimeLineID when writing
the backup label.

I've attached a very simple patch that fixes it.

Regards,
Marco

--
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco(dot)nenciarini(at)2ndQuadrant(dot)it | www.2ndQuadrant.it

Attachment Content-Type Size
timeline.patch text/x-patch 652 bytes

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message blake 2016-07-06 18:55:02 BUG #14231: logical replication wal sender process spins when using error traps in function
Previous Message Marco Nenciarini 2016-07-06 15:41:56 Re: BUG #14230: Wrong timeline returned by pg_stop_backup on a standby

Browse pgsql-hackers by date

  From Date Subject
Next Message petrum@gmail.com 2016-07-06 16:07:14 Question about an inconsistency - 1
Previous Message Marco Nenciarini 2016-07-06 15:41:56 Re: BUG #14230: Wrong timeline returned by pg_stop_backup on a standby