Re: problems using pg_start_backup/pg_stop_backup and pg_basebackup at same time

From: Alex Malek <magicagent(at)gmail(dot)com>
To: pgsql-admin <pgsql-admin(at)postgresql(dot)org>
Subject: Re: problems using pg_start_backup/pg_stop_backup and pg_basebackup at same time
Date: 2016-06-27 18:27:32
Message-ID: CAGH8ccehK5FzVRah2PxJGhfRB7r=O1DNpj-oPRo6rjibMU-m0w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-hackers

Should I file a bug?

Does anyone know of a workaround?

I could try running pg_basebackup w/ --xlog-method=fetch and see if that
helps but I do like knowing that the base backup will have all needed WAL
files regardless of WAL retention policy on master.

Thanks.
Alex

Recap of what I *think* is going on:

1) pg_basebackup w/ --xlog-method=stream is started
2) archive_command is not executed while pg_basebackup() is running
3) while pg_basebackup() is running pg_start_backup() is executed
4) Later pg_stop_backup() is executed.
5) pg_stop_backup() hangs related to the archive_command not running
6) hours later when pg_basebackup finishes, back logged archive commands
get executed then pg_stop_backup() finishes

On Mon, Jun 20, 2016 at 10:15 AM, Alex Malek <magicagent(at)gmail(dot)com> wrote:

>
> Thanks for the response Magnus.
> I've replied inline below.
>
> On Mon, Jun 20, 2016 at 4:17 AM, Magnus Hagander <magnus(at)hagander(dot)net>
> wrote:
>
>> On Mon, Jun 13, 2016 at 6:59 PM, Alex Malek <magicagent(at)gmail(dot)com> wrote:
>>
>>>
>>> I am experiencing two problems
>>> 1) pg_stop_backup hangs until pg_basebackup finishes
>>> 2) WAL contains references to invalid pages
>>>
>>> (I suspect when 1. is fixed I won't experience 2. )
>>>
>>> Full description below:
>>>
>>>
>>> postgresql: 9.3.13
>>>
>>> My situation is I create a writable report database every day using
>>> pg_start_backup() / rsync / pg_stop_backup()
>>> (whole process takes 1-2 hours, the rsync actually copies the data dir
>>> of a slave/warm spare DB)
>>>
>>
>> Taking what is called an exclusive backup, which is what
>> pg_start_backup() does, is not supported off a standby node. Only
>> pg_basebackup is supported against a standby node.
>>
>> Can you reproduce your errors if you make this backup from the master?
>>
>
>
> pg_start_backup() is run on the master.
> The rsync copies the data dir of the standby b/c it is local.
> I have tried rsyncing from the master w/ the same results.
>
>
>>
>>
>>
>>> Also once a week I create a database for backup/archive purposes using
>>> pg_basebackup
>>> (whole process takes about 13 hours)
>>>
>>>
>>> These two processes used to be able to coincide until recently.
>>> Recent changes include a major debian upgrade and a minor version of
>>> postgres upgrade from 9.3.10.
>>>
>>> Now when the report sync occurs during the pg_basebackup (w/
>>> --xlog-method=stream option) the pg_stop_backup() hangs until
>>> the the pg_basebackup has completed (4 hours later).
>>>
>>
>> Do you also have an archive_command, and what is it? Normally,
>> pg_stop_backup() blocks on the archive command - so perhaps you have ended
>> up with a dependency between that one and the base backup command somehow?
>>
>>
> The archive command is:
>
> archive_command = 'rsync -a %p postgres(at)archivedb
> :/storage/postgres/9.3/main-archive/pg_xlog/%f'
>
> I thought the archive command was working properly while pg_stop_backup
> was hanging based on the timestamps of the files, but upon closer
> inspection using stat I see that the files actually were not copied over
> until pg_stop_backup stopped hanging.
>
> Actually it appears that the archive_command is not executed the entire
> time pg_basebackup is running.
> pg_basebackup is started about 5 hours before pg_start_backup is executed.
> Based on stat it looks like the wal files get written locally but do not
> get rsynced until pg_basebackup is done.
>
>
>
>>
>>
>>> A labeled WAL backup file is created when the pg_stop_backup() is first
>>> executed and another
>>> is created when the pg_basebackup completes.
>>>
>>>
>>>
>>> While the pg_stop_backup() hangs the following appears in the logs:
>>>
>>> 2016-06-11 07:50:45 EDT: WARNING: pg_stop_backup still waiting for all
>>> required WAL segments to be archived (7680 seconds elapsed)
>>> 2016-06-11 07:50:45 EDT: HINT: Check that your archive_command is
>>> executing properly. pg_stop_backup can be canceled safely, but the
>>> database backup will not be usable without all the WAL segments.
>>>
>>>
>>> until eventually:
>>>
>>> 2016-06-11 12:59:07 EDT:LOG: duration: 26190082.497 ms statement:
>>> SELECT pg_stop_backup()
>>>
>>> The archive command appears to work as WAL files are being archived
>>> successfully with timestamps that occur while pg_stop_backup is hanging.
>>>
>>> -rw------- 1 postgres postgres 314 Jun 11 12:56
>>> 000000030000109200000066.00000028.backup
>>> -rw------- 1 postgres postgres 16M Jun 11 12:56 000000030000109300000029
>>> -rw------- 1 postgres postgres 16M Jun 11 12:54 000000030000109300000028
>>> -rw------- 1 postgres postgres 16M Jun 11 12:52 000000030000109300000027
>>> <snip>
>>> -rw------- 1 postgres postgres 16M Jun 11 06:22 0000000300001092000000F1
>>> -rw------- 1 postgres postgres 335 Jun 11 05:42
>>> 0000000300001092000000D4.00000028.backup
>>> -rw------- 1 postgres postgres 16M Jun 11 05:42 0000000300001092000000F0
>>>
>>> Content of 0000000300001092000000D4.00000028.backup:
>>>
>>> START WAL LOCATION: 1092/D4000028 (file 0000000300001092000000D4)
>>> STOP WAL LOCATION: 1092/F008EA10 (file 0000000300001092000000F0)
>>> CHECKPOINT LOCATION: 1092/D4004E08
>>> BACKUP METHOD: pg_start_backup
>>> BACKUP FROM: master
>>> START TIME: 2016-06-11 04:51:04 EDT
>>> LABEL: reporting 2016-06-11 04:51:03.892804-04
>>> STOP TIME: 2016-06-11 05:42:37 EDT
>>>
>>
>> This file shows that the backup took about an hour, doesn't it? So it
>> doesn't look like it was waiting?
>>
>> Or are you saying the wait happens *after* the .backup file has been
>> dropped in the archives?
>>
>
> The .backup file gets written locally in a timely manner but doesn't get
> dropped into the archive until after the
> pg_stop_backup() stops hanging.
>
> So it appears that pg_basebackup ( perhaps --xlog-method=stream ? ) is
> causing the archive_command to pause which in turn
> causes pg_stop_backup() to hang?
>
> Thanks.
>
> Alex
>
>

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Tom Lane 2016-06-27 20:56:17 Re: Changing column order through dump and restore
Previous Message Alex Stanier 2016-06-27 13:49:11 Changing column order through dump and restore

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-06-27 19:11:07 MinMaxAggPath vs. parallel-safety
Previous Message Akash Agrawal 2016-06-27 18:27:15 How to kill a Background worker and Its metadata