Re: Disable WAL logging to speed up data loading

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
Cc: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, 'Kyotaro Horiguchi' <horikyota(dot)ntt(at)gmail(dot)com>, "sawada(dot)mshk(at)gmail(dot)com" <sawada(dot)mshk(at)gmail(dot)com>, "masao(dot)fujii(at)oss(dot)nttdata(dot)com" <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, "laurenz(dot)albe(at)cybertec(dot)at" <laurenz(dot)albe(at)cybertec(dot)at>, "ashutosh(dot)bapat(dot)oss(at)gmail(dot)com" <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Disable WAL logging to speed up data loading
Date: 2020-12-25 07:09:06
Message-ID: X+WQEgRDGHyLdH2x@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 03, 2020 at 03:52:47AM +0000, tsunakawa(dot)takay(at)fujitsu(dot)com wrote:
> The code looks good, and the performance seems to be nice, so I
> marked this ready for committer.

FWIW, I am extremely afraid of this proposal because this is basically
a footgun able to corrupt customer instances, and I am ready to bet
that people would switch wal_level to none because they see a lot of
benefits in doing so at first sight, until the host running the server
is plugged off and they need to use pg_resetwal in urgency to bring an
instance online. Users have already the option to make things go bad,
just by disabling full page writes or fsync, but I really don't think
that we should put in their hands more options able to break
instances, nor should we try to spend more efforts in having more
"protections" that would trigger only once the instance is already
fried.

Perhaps this is something that Horiguchi-san pointed out upthread in
[1] (last sentence of first paragraph), but did you consider that it
is already possible to do bulk-loading with a minimal amount of WAL
generated as long as you do the COPY within the transaction that
created the table? Quoting the docs in [2]:
"COPY is fastest when used within the same transaction as an earlier
CREATE TABLE or TRUNCATE command. In such cases no WAL needs to be
written, because in case of an error, the files containing the newly
loaded data will be removed anyway. However, this consideration only
applies when wal_level is minimal as all commands must write WAL
otherwise."

Upgrade scenarios have been mentioned in this case as being a pain
when it comes to take advantage of this optimization. Wouldn't it be
safer if we took a client approach instead, where restores are able to
load the data with a cheap succession of commands by loading the data
in a transaction done after a TRUNCATE?

[1]: https://www.postgresql.org/message-id/20201001.115136.288898409051085426.horikyota.ntt@gmail.com
[2]: https://www.postgresql.org/docs/current/populate.html#POPULATE-COPY-FROM
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-12-25 07:20:49 Re: [bug fix] ALTER TABLE SET LOGGED/UNLOGGED on a partitioned table does nothing silently
Previous Message Michael Paquier 2020-12-25 06:41:46 Re: create table like: ACCESS METHOD