Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT
Date: 2014-10-28 00:01:24
Message-ID: 544EDCD4.1030005@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 28.10.2014 00:06, Andrew Dunstan wrote:
>
> On 10/27/2014 07:01 PM, Andres Freund wrote:
>> On 2014-10-27 18:57:27 -0400, Andrew Dunstan wrote:
>>> On 10/27/2014 05:58 PM, Tomas Vondra wrote:
>>>> On 27.10.2014 17:24, Heikki Linnakangas wrote:
>>>> I'm also thinking that for wal_level=archive and large databases, this
>>>> won't really eliminate the checkpoint as it will likely generate enough
>>>> WAL to hit checkpoint_segments and trigger a checkpoint anyway. No?
>>>>
>>>> That being said, our CREATE DATABASE docs currently say this
>>>>
>>>> Although it is possible to copy a database other than template1 by
>>>> specifying its name as the template, this is not (yet) intended as
>>>> a general-purpose "COPY DATABASE" facility. The principal
>>>> limitation is that no other sessions can be connected to the
>>>> template database while it is being copied. CREATE DATABASE will
>>>> fail if any other connection exists when it starts; otherwise, new
>>>> connections to the template database are locked out until CREATE
>>>> DATABASE completes. See Section 21.3 for more information.
>>>>
>>>> I think that this limitation pretty much means no one should use CREATE
>>>> DATABASE for cloning live databases in production environment (because
>>>> of the locking).
>>>>
>>>> It also seems to me the "general-purpose COPY DATABASE" described in
>>>> the
>>>> docs is what we're describing in this thread.
>>>>
>>>
>>> Notwithstanding what the docs say, I have seen CREATE DATABASE used
>>> plenty
>>> of times, and quite effectively, to clone databases. I don't think
>>> making it
>>> do twice the IO in the general case is going to go down well.
>> I think they're actually more likely to be happy that we wouldn't need
>> do a immediate checkpoint anymore. The performance penalty from that
>> likely to be much more severe than the actual IO.
>>
>
>
> At the very least that needs to be benchmarked.

The question is what workload are we going to benchmark. It's unlikely
one of the approaches to be a clear winner in all cases, so we'll have
to decide which ones are more common / important, or somehow combine
both approaches (and thus not getting some of the WAL-only benefits).

I'm pretty sure we'll see about three main cases:

(1) read-mostly workloads

Current approach wins, because checkpoint is cheap and the
WAL-based approach results in 2x the I/O.

The difference is proportional to template database size. For
small databases it's negligible, for large databases it's more
significant.

(2) write-heavy workloads / small template database

WAL-based approach wins, because it does not require explicit
checkpoint and for small databases the I/O generated by WAL-logging
everything is lower than checkpoint (which is more random).

This is the case of our PostgreSQL clusters.

(3) write-heavy workloads / large template database

Current approach wins, for two reasons: (a) for large databases the
WAL-logging overhead may generate much more I/O than a checkpoint,
and (b) it may generate so many WAL segments it eventually triggers
a checkpoint anyway (even repeatedly).

The exact boundary between the cases really depends on multiple things:

(a) shared_buffers size (the larger the more expensive checkpoint)
(b) read-write activity (more writes => more expensive checkpoint)
(c) hardware (especially how well it handles random I/O)

Not sure how to decide which case is more important, and I agree that
there are people using CREATE DATABASE to clone databases - maybe not in
production, but e.g. for testing purposes (still, it'd be rather
unfortunate to make it considerably slower for them). Not sure how to
balance this :-/

So maybe we shouldn't cling to the WAL-logging approach too much. Maybe
Heikki's idea from to abandon the full checkpoint and instead assume
that once the transaction commits, all the files were fsynced OK. Of
couse, this will do nothing about the replay hazards.

regards
Tomas

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2014-10-28 00:15:01 Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}
Previous Message Marko Tiikkaja 2014-10-27 23:44:10 Re: pgcrypto: PGP signatures