From: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
---|---|
To: | Tomas Vondra <tv(at)fuzzy(dot)cz> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT |
Date: | 2014-10-27 13:28:03 |
Message-ID: | 544E4863.2080407@vmware.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 10/27/2014 03:21 PM, Tomas Vondra wrote:
> Dne 27 Říjen 2014, 10:47, Heikki Linnakangas napsal(a):
>> On 10/26/2014 11:47 PM, Tomas Vondra wrote:
>>> After eyeballing the code for an hour or two, I think CREATE DATABASE
>>> should be fine with performing only a 'partial checkpoint' on the
>>> template database - calling FlushDatabaseBuffers and processing unlink
>>> requests, as suggested by the comment in createdb().
>>
>> Hmm. You could replace the first checkpoint with that, but I don't think
>> that's enough for the second. To get any significant performance
>> benefit, you need to get rid of both checkpoints, because doing two
>> checkpoints one after another is almost as fast as doing a single
>> checkpoint; the second checkpoint has very little work to do because the
>> first checkpoint already flushed out everything.
>>
>> The second checkpoint, after copying but before commit, is done because
>> (from the comments in createdb function):
>>
>>> * #1: When PITR is off, we don't XLOG the contents of newly created
>>> * indexes; therefore the drop-and-recreate-whole-directory behavior
>>> * of DBASE_CREATE replay would lose such indexes.
>>>
>>> * #2: Since we have to recopy the source database during DBASE_CREATE
>>> * replay, we run the risk of copying changes in it that were
>>> * committed after the original CREATE DATABASE command but before the
>>> * system crash that led to the replay. This is at least unexpected
>>> * and at worst could lead to inconsistencies, eg duplicate table
>>> * names.
>>
>> Doing only FlushDatabaseBuffers would not prevent these issues - you
>> need a full checkpoint. These issues are better explained here:
>> http://www.postgresql.org/message-id/28884.1119727671@sss.pgh.pa.us
>
> Thinking about this a bit more, do we really need a full checkpoint? That
> is a checkpoint of all the databases in the cluster? Why checkpointing the
> source database is not enough?
>
> I mean, when we use database A as a template, why do we need to checkpoint
> B, C, D and F too? (Apologies if this is somehow obvious, I'm way out of
> my comfort zone in this part of the code.)
A full checkpoint ensures that you always begin recovery *after* the
DBASE_CREATE record. I.e. you never replay a DBASE_CREATE record during
crash recovery (except when you crash before the transaction commits, in
which case it doesn't matter if the new database's directory is borked).
- Heikki
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2014-10-27 13:30:33 | Re: Missing FIN_CRC32 calls in logical replication code |
Previous Message | Tomas Vondra | 2014-10-27 13:21:58 | Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT |