From: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
---|---|
To: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Allowing multiple concurrent base backups |
Date: | 2011-03-18 11:56:30 |
Message-ID: | 4D83486E.7040509@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 18.03.2011 10:48, Heikki Linnakangas wrote:
> On 17.03.2011 21:39, Robert Haas wrote:
>> On Mon, Jan 31, 2011 at 10:45 PM, Fujii Masao<masao(dot)fujii(at)gmail(dot)com>
>> wrote:
>>> On Tue, Feb 1, 2011 at 1:31 AM, Heikki Linnakangas
>>> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>>> Hmm, good point. It's harmless, but creating the history file in the
>>>> first
>>>> place sure seems like a waste of time.
>>>
>>> The attached patch changes pg_stop_backup so that it doesn't create
>>> the backup history file if archiving is not enabled.
>>>
>>> When I tested the multiple backups, I found that they can have the same
>>> checkpoint location and the same history file name.
>>>
>>> --------------------
>>> $ for ((i=0; i<4; i++)); do
>>> pg_basebackup -D test$i -c fast -x -l test$i&
>>> done
>>>
>>> $ cat test0/backup_label
>>> START WAL LOCATION: 0/20000B0 (file 000000010000000000000002)
>>> CHECKPOINT LOCATION: 0/20000E8
>>> START TIME: 2011-02-01 12:12:31 JST
>>> LABEL: test0
>>>
>>> $ cat test1/backup_label
>>> START WAL LOCATION: 0/20000B0 (file 000000010000000000000002)
>>> CHECKPOINT LOCATION: 0/20000E8
>>> START TIME: 2011-02-01 12:12:31 JST
>>> LABEL: test1
>>>
>>> $ cat test2/backup_label
>>> START WAL LOCATION: 0/20000B0 (file 000000010000000000000002)
>>> CHECKPOINT LOCATION: 0/20000E8
>>> START TIME: 2011-02-01 12:12:31 JST
>>> LABEL: test2
>>>
>>> $ cat test3/backup_label
>>> START WAL LOCATION: 0/20000B0 (file 000000010000000000000002)
>>> CHECKPOINT LOCATION: 0/20000E8
>>> START TIME: 2011-02-01 12:12:31 JST
>>> LABEL: test3
>>>
>>> $ ls archive/*.backup
>>> archive/000000010000000000000002.000000B0.backup
>>> --------------------
>>>
>>> This would cause a serious problem. Because the backup-end record
>>> which indicates the same "START WAL LOCATION" can be written by the
>>> first backup before the other finishes. So we might think wrongly that
>>> we've already reached a consistency state by reading the backup-end
>>> record (written by the first backup) before reading the last required
>>> WAL
>>> file.
>>>
>>> /*
>>> * Force a CHECKPOINT. Aside from being necessary to prevent torn
>>> * page problems, this guarantees that two successive backup runs will
>>> * have different checkpoint positions and hence different history
>>> * file names, even if nothing happened in between.
>>> *
>>> * We use CHECKPOINT_IMMEDIATE only if requested by user (via passing
>>> * fast = true). Otherwise this can take awhile.
>>> */
>>> RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT |
>>> (fast ? CHECKPOINT_IMMEDIATE : 0));
>>>
>>> This problem happens because the above code (in do_pg_start_backup)
>>> actually doesn't ensure that the concurrent backups have the different
>>> checkpoint locations. ISTM that we should change the above or elsewhere
>>> to ensure that.
>
> Yes, good point.
Here's a patch based on that approach, ensuring that each base backup
uses a different checkpoint as the start location. I think I'll commit
this, rather than invent a new unique ID mechanism for backups. The
latter would need changes in recovery and control file too, and I don't
feel like tinkering with that at this stage.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Attachment | Content-Type | Size |
---|---|---|
ensure-unique-backup-start-locations-1.patch | text/x-diff | 5.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2011-03-18 12:14:24 | Re: Re: [COMMITTERS] pgsql: Basic Recovery Control functions for use in Hot Standby. Pause, |
Previous Message | Vaibhav Kaushal | 2011-03-18 10:33:21 | Re: I am confused after reading codes of PostgreSQL three week |