Disable WAL logging to speed up data loading

From: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Disable WAL logging to speed up data loading
Date: 2020-09-29 08:28:03
Message-ID: TYAPR01MB29901EBE5A3ACCE55BA99186FE320@TYAPR01MB2990.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

We'd like to propose a feature to disable WAL to speed up data loading. This was inspired by a feature added in the latest MySQL. I wish you won't fear this feature...

BACKGROUND
========================================

This branches off from [1] as mentioned therein. Briefly speaking, a customer wants to shorten the time for nightly loading of data into their data warehouse as much as possible to be prepared for using the data warehouse for new things.

Currently, they are using Oracle's SQL*Loader with its no-logging feature. They want a similar feature to migrate to Postgres. Other than the data loading performance, they don't want to be concerned about the storage for large volumes of WAL.

In [1], we thought about something like Oracle's per-table no-logging feature, but it seems difficult (or at least not easy.) Meanwhile, I found another feature added in the latest MySQL 8.0.21 [2]. This proposal follows it almost directly. That satisfies the customer request.

As an aside, it's also conceivable that in the near future, users could see the WAL bottleneck (WAL buffer or disk) when they utilize the parallel COPY that is being developed in the community.

FUNCTIONAL SPECIFICATION
========================================

Add a new value 'none' to the server configuration parameter wal_level. With this setting:

* No WAL is emitted.

* The server refuses to start (pg_ctl start fails) after an abnormal shutdown due to power outage, pg_ctl's immediate shutdown, etc, showing a straightforward message like MySQL.

* Features like continuous archiving, pg_basebackup, and streaming/logical replication that requires wal_level >= replica are not available.

* The user can use all features again if you shut down the server successfully after data loading and reset wal_level to a value other than none. He needs to take a base backup or rebuild the replication standby after restarting the server.

In addition to the cosmetic modifications to the manual articles that refer to wal_level, add a clause or paragraphs to the following sections to let users know the availability of this feature.

14.4. Populating a Database
18.6.1. Upgrading Data via pg_dumpall

PROGRAM DESIGN (main point only)
========================================

As in the bootstrap mode (during initdb), when wal_level = none, XLogInsert() does nothing and just returns a fixed value, which is the tail of the last shutdown checkpoint WAL record. As a result, the value is set to the relation page header's LSN field.

In addition, it might be worth having XLogBeginInsert() and XLogRec...() to check wal_level and just return. I don't expect much from this, but it may be interesting to give it a try to see the squeezed performance.

StartupXLOG() checks the wal_level setting in pg_control and quits the startup with ereport(FATAL) accordingly.

[1]
Implement UNLOGGED clause for COPY FROM
https://www.postgresql.org/message-id/OSBPR01MB488887C0BDC5129C65DFC5E5ED640@OSBPR01MB4888.jpnprd01.prod.outlook.com

[2]
Disabling Redo Logging
https://dev.mysql.com/doc/refman/8.0/en/innodb-redo-log.html#innodb-disable-redo-logging

Regards
Takayuki Tsunakawa

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hamid Akhtar 2020-09-29 08:49:23 Re: Improved Cost Calculation for IndexOnlyScan
Previous Message Thomas Munro 2020-09-29 08:12:13 Re: Assertion failure with barriers in parallel hash join