Re: Allowing multiple concurrent base backups

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Allowing multiple concurrent base backups
Date: 2011-03-18 11:56:30
Message-ID: 4D83486E.7040509@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 18.03.2011 10:48, Heikki Linnakangas wrote:
> On 17.03.2011 21:39, Robert Haas wrote:
>> On Mon, Jan 31, 2011 at 10:45 PM, Fujii Masao<masao(dot)fujii(at)gmail(dot)com>
>> wrote:
>>> On Tue, Feb 1, 2011 at 1:31 AM, Heikki Linnakangas
>>> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>>> Hmm, good point. It's harmless, but creating the history file in the
>>>> first
>>>> place sure seems like a waste of time.
>>>
>>> The attached patch changes pg_stop_backup so that it doesn't create
>>> the backup history file if archiving is not enabled.
>>>
>>> When I tested the multiple backups, I found that they can have the same
>>> checkpoint location and the same history file name.
>>>
>>> --------------------
>>> $ for ((i=0; i<4; i++)); do
>>> pg_basebackup -D test$i -c fast -x -l test$i&
>>> done
>>>
>>> $ cat test0/backup_label
>>> START WAL LOCATION: 0/20000B0 (file 000000010000000000000002)
>>> CHECKPOINT LOCATION: 0/20000E8
>>> START TIME: 2011-02-01 12:12:31 JST
>>> LABEL: test0
>>>
>>> $ cat test1/backup_label
>>> START WAL LOCATION: 0/20000B0 (file 000000010000000000000002)
>>> CHECKPOINT LOCATION: 0/20000E8
>>> START TIME: 2011-02-01 12:12:31 JST
>>> LABEL: test1
>>>
>>> $ cat test2/backup_label
>>> START WAL LOCATION: 0/20000B0 (file 000000010000000000000002)
>>> CHECKPOINT LOCATION: 0/20000E8
>>> START TIME: 2011-02-01 12:12:31 JST
>>> LABEL: test2
>>>
>>> $ cat test3/backup_label
>>> START WAL LOCATION: 0/20000B0 (file 000000010000000000000002)
>>> CHECKPOINT LOCATION: 0/20000E8
>>> START TIME: 2011-02-01 12:12:31 JST
>>> LABEL: test3
>>>
>>> $ ls archive/*.backup
>>> archive/000000010000000000000002.000000B0.backup
>>> --------------------
>>>
>>> This would cause a serious problem. Because the backup-end record
>>> which indicates the same "START WAL LOCATION" can be written by the
>>> first backup before the other finishes. So we might think wrongly that
>>> we've already reached a consistency state by reading the backup-end
>>> record (written by the first backup) before reading the last required
>>> WAL
>>> file.
>>>
>>> /*
>>> * Force a CHECKPOINT. Aside from being necessary to prevent torn
>>> * page problems, this guarantees that two successive backup runs will
>>> * have different checkpoint positions and hence different history
>>> * file names, even if nothing happened in between.
>>> *
>>> * We use CHECKPOINT_IMMEDIATE only if requested by user (via passing
>>> * fast = true). Otherwise this can take awhile.
>>> */
>>> RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT |
>>> (fast ? CHECKPOINT_IMMEDIATE : 0));
>>>
>>> This problem happens because the above code (in do_pg_start_backup)
>>> actually doesn't ensure that the concurrent backups have the different
>>> checkpoint locations. ISTM that we should change the above or elsewhere
>>> to ensure that.
>
> Yes, good point.

Here's a patch based on that approach, ensuring that each base backup
uses a different checkpoint as the start location. I think I'll commit
this, rather than invent a new unique ID mechanism for backups. The
latter would need changes in recovery and control file too, and I don't
feel like tinkering with that at this stage.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
ensure-unique-backup-start-locations-1.patch text/x-diff 5.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-03-18 12:14:24 Re: Re: [COMMITTERS] pgsql: Basic Recovery Control functions for use in Hot Standby. Pause,
Previous Message Vaibhav Kaushal 2011-03-18 10:33:21 Re: I am confused after reading codes of PostgreSQL three week