Re: Allowing multiple concurrent base backups

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Allowing multiple concurrent base backups
Date: 2011-03-21 09:29:03
Message-ID: 4D871A5F.5080603@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 18.03.2011 13:56, Heikki Linnakangas wrote:
> On 18.03.2011 10:48, Heikki Linnakangas wrote:
>> On 17.03.2011 21:39, Robert Haas wrote:
>>> On Mon, Jan 31, 2011 at 10:45 PM, Fujii Masao<masao(dot)fujii(at)gmail(dot)com>
>>> wrote:
>>>> On Tue, Feb 1, 2011 at 1:31 AM, Heikki Linnakangas
>>>> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>>>> Hmm, good point. It's harmless, but creating the history file in the
>>>>> first
>>>>> place sure seems like a waste of time.
>>>>
>>>> The attached patch changes pg_stop_backup so that it doesn't create
>>>> the backup history file if archiving is not enabled.
>>>>
>>>> When I tested the multiple backups, I found that they can have the same
>>>> checkpoint location and the same history file name.
>>>>
>>>> --------------------
>>>> $ for ((i=0; i<4; i++)); do
>>>> pg_basebackup -D test$i -c fast -x -l test$i&
>>>> done
>>>>
>>>> $ cat test0/backup_label
>>>> START WAL LOCATION: 0/20000B0 (file 000000010000000000000002)
>>>> CHECKPOINT LOCATION: 0/20000E8
>>>> START TIME: 2011-02-01 12:12:31 JST
>>>> LABEL: test0
>>>>
>>>> $ cat test1/backup_label
>>>> START WAL LOCATION: 0/20000B0 (file 000000010000000000000002)
>>>> CHECKPOINT LOCATION: 0/20000E8
>>>> START TIME: 2011-02-01 12:12:31 JST
>>>> LABEL: test1
>>>>
>>>> $ cat test2/backup_label
>>>> START WAL LOCATION: 0/20000B0 (file 000000010000000000000002)
>>>> CHECKPOINT LOCATION: 0/20000E8
>>>> START TIME: 2011-02-01 12:12:31 JST
>>>> LABEL: test2
>>>>
>>>> $ cat test3/backup_label
>>>> START WAL LOCATION: 0/20000B0 (file 000000010000000000000002)
>>>> CHECKPOINT LOCATION: 0/20000E8
>>>> START TIME: 2011-02-01 12:12:31 JST
>>>> LABEL: test3
>>>>
>>>> $ ls archive/*.backup
>>>> archive/000000010000000000000002.000000B0.backup
>>>> --------------------
>>>>
>>>> This would cause a serious problem. Because the backup-end record
>>>> which indicates the same "START WAL LOCATION" can be written by the
>>>> first backup before the other finishes. So we might think wrongly that
>>>> we've already reached a consistency state by reading the backup-end
>>>> record (written by the first backup) before reading the last required
>>>> WAL
>>>> file.
>>>>
>>>> /*
>>>> * Force a CHECKPOINT. Aside from being necessary to prevent torn
>>>> * page problems, this guarantees that two successive backup runs will
>>>> * have different checkpoint positions and hence different history
>>>> * file names, even if nothing happened in between.
>>>> *
>>>> * We use CHECKPOINT_IMMEDIATE only if requested by user (via passing
>>>> * fast = true). Otherwise this can take awhile.
>>>> */
>>>> RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT |
>>>> (fast ? CHECKPOINT_IMMEDIATE : 0));
>>>>
>>>> This problem happens because the above code (in do_pg_start_backup)
>>>> actually doesn't ensure that the concurrent backups have the different
>>>> checkpoint locations. ISTM that we should change the above or elsewhere
>>>> to ensure that.
>>
>> Yes, good point.
>
> Here's a patch based on that approach, ensuring that each base backup
> uses a different checkpoint as the start location. I think I'll commit
> this, rather than invent a new unique ID mechanism for backups. The
> latter would need changes in recovery and control file too, and I don't
> feel like tinkering with that at this stage.

Ok, committed this.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2011-03-21 10:24:22 Re: 2nd Level Buffer Cache
Previous Message _石头 2011-03-21 09:26:56 When and where do PG invoke PLs module?