Re: Replication/backup defaults

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Replication/backup defaults
Date: 2017-01-05 20:50:05
Message-ID: c2d7db7e-7e05-51cf-206d-07a8cafe1c98@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 01/05/2017 05:37 PM, Stephen Frost wrote:
> Tomas,
>
> * Tomas Vondra (tomas(dot)vondra(at)2ndquadrant(dot)com) wrote:
>> On 01/05/2017 02:23 PM, Magnus Hagander wrote:
>>> It's easy enough to construct a benchmark specifically to show the
>>> difference, but of any actual "normal workload" for it. Typically the
>>> optimization applies to things like bulk loading, which typically never
>>> done alone and does not lend itself to that type of benchmarking very
>>> easily.
>>
>> Not sure if I understand correctly what you're saying. You're saying
>> that although it'd be easy to construct a benchmark showing
>> significant performance impact, it won't represent a common
>> workload. Correct?
>
> I think he's saying that it's not very easy to construct a good example
> of typical bulk-loading workloads using just pgbench. Bulk loading
> certainly happens with PG and I don't think we'll make very many friends
> if we break optimizations when wal_level is set to minimal like those
> you get using:
>
> BEGIN;
> CREATE TABLE x (c1 int);
> COPY x FROM STDIN;
> COMMIT;
>
> or:
>
> BEGIN;
> TRUNCATE x;
> COPY x FROM STDIN;
> COMMIT;
>
> Changing the wal_level from 'minimal' to 'replica' or 'logical' with
> such a benchmark is going to make the WAL go from next-to-nothing to
> size-of-database.

Sure, I do know how to construct such workloads - and it's trivial even
with pgbench custom scripts. The question is whether such workloads are
common or not.

Most importantly, no one is proposing to break the optimizations, but
changing the defaults - users relying on the optimizations are free to
switch back to wal_level=minimal if needed.

>
> One doesn't typically *just* do bulk loads, however,
> often it's a bulk load into a table and then the contents of that table
> are merged with another table or perhaps joined to it to produce some
> report or something along those lines. In many of those cases, our
> more-recently added capability to have UNLOGGED tables will work, but
> not all (in particular, it can be very handy to load everything in using
> the above technique and then switch the wal_level to replica, which
> avoids having to have the bulk of the data sent through WAL, something
> you can't avoid if you want to turn an unlogged table into a logged
> one).
>

Ultimately, the question is whether the number of people running into
"Hey, I can't take pg_basebackup or setup a standby with the default
config!" is higher or lower than number of people running into "Hey,
CREATE TABLE + COPY is slower now!"

I haven't seen many systems relying on such load optimizations, for a
number of reasons:

1) The important/critical systems usually have replicas, so are
inherently incompatible with wal_level=minimal.

2) The batch jobs usually don't truncate the main table, but load the
increment into a temporary/unlogged table first, then merge it into the
main one.

That is not to say there are no other cases benefiting from those
optimizations, but we're talking about the default value - we're not
removing the wal_level=minimal.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-01-05 21:33:40 Re: [COMMITTERS] pgsql: Fix possible crash reading pg_stat_activity.
Previous Message Ants Aasma 2017-01-05 20:11:04 Re: Replication slot xmin is not reset if HS feedback is turned off while standby is shut down