Quick Links

Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT
Date:	2014-10-27 13:28:03
Message-ID:	544E4863.2080407@vmware.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 10/27/2014 03:21 PM, Tomas Vondra wrote:
> Dne 27 Říjen 2014, 10:47, Heikki Linnakangas napsal(a):
>> On 10/26/2014 11:47 PM, Tomas Vondra wrote:
>>> After eyeballing the code for an hour or two, I think CREATE DATABASE
>>> should be fine with performing only a 'partial checkpoint' on the
>>> template database - calling FlushDatabaseBuffers and processing unlink
>>> requests, as suggested by the comment in createdb().
>>
>> Hmm. You could replace the first checkpoint with that, but I don't think
>> that's enough for the second. To get any significant performance
>> benefit, you need to get rid of both checkpoints, because doing two
>> checkpoints one after another is almost as fast as doing a single
>> checkpoint; the second checkpoint has very little work to do because the
>> first checkpoint already flushed out everything.
>>
>> The second checkpoint, after copying but before commit, is done because
>> (from the comments in createdb function):
>>
>>> * #1: When PITR is off, we don't XLOG the contents of newly created
>>> * indexes; therefore the drop-and-recreate-whole-directory behavior
>>> * of DBASE_CREATE replay would lose such indexes.
>>>
>>> * #2: Since we have to recopy the source database during DBASE_CREATE
>>> * replay, we run the risk of copying changes in it that were
>>> * committed after the original CREATE DATABASE command but before the
>>> * system crash that led to the replay. This is at least unexpected
>>> * and at worst could lead to inconsistencies, eg duplicate table
>>> * names.
>>
>> Doing only FlushDatabaseBuffers would not prevent these issues - you
>> need a full checkpoint. These issues are better explained here:
>> http://www.postgresql.org/message-id/28884.1119727671@sss.pgh.pa.us
>
> Thinking about this a bit more, do we really need a full checkpoint? That
> is a checkpoint of all the databases in the cluster? Why checkpointing the
> source database is not enough?
>
> I mean, when we use database A as a template, why do we need to checkpoint
> B, C, D and F too? (Apologies if this is somehow obvious, I'm way out of
> my comfort zone in this part of the code.)

A full checkpoint ensures that you always begin recovery *after* the
DBASE_CREATE record. I.e. you never replay a DBASE_CREATE record during
crash recovery (except when you crash before the transaction commits, in
which case it doesn't matter if the new database's directory is borked).

- Heikki

In response to

Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT at 2014-10-27 13:21:58 from Tomas Vondra

Responses

Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT at 2014-10-27 13:46:41 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2014-10-27 13:30:33	Re: Missing FIN_CRC32 calls in logical replication code
Previous Message	Tomas Vondra	2014-10-27 13:21:58	Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT