RE: Implement UNLOGGED clause for COPY FROM

From: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
To: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>, "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: Implement UNLOGGED clause for COPY FROM
Date: 2020-08-26 07:24:43
Message-ID: TYAPR01MB29906AA32A73523EF0802264FE540@TYAPR01MB2990.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

I think it's worth thinking about a sophisticated feature like Oracle's UNRECOVERABLE data loading (because SQL Server's BCP load utility also has such a feature, but for an empty table), how about an easier approach like MySQL? I expect this won't complicate Postgres code much.

The customer is using Oracle RAC for high availability of a data warehouse. Then, I think they can use the traditional shared disk-based HA clustering, not the streaming replication when they migrate to Postgres.

They load data into the data warehouse with the nightly ETL or ELT. The loading window is limited, so they run multiple concurrent loading sessions, with the transaction logging off. They probably use all resources for the data loading during that period.

Then, you might think "How about turning fsync and full_page_writes to off?" But the customer doesn't like to be worried about the massive amount of WAL generated during the loading.

OTOH, the latest MySQL 8.0.21 introduced the following feature. This is for the initial data loading into a new database instance, though.

https://dev.mysql.com/doc/refman/8.0/en/innodb-redo-log.html#innodb-disable-redo-logging
--------------------------------------------------
Disabling Redo Logging
As of MySQL 8.0.21, you can disable redo logging using the ALTER INSTANCE DISABLE INNODB REDO_LOG statement. This functionality is intended for loading data into a new MySQL instance. Disabling redo logging speeds up data loading by avoiding redo log writes and doublewrite buffering.

Warning
This feature is intended only for loading data into a new MySQL instance. Do not disable redo logging on a production system. It is permitted to shutdown and restart the server while redo logging is disabled, but an unexpected server stoppage while redo logging is disabled can cause data loss and instance corruption.

Attempting to restart the server after an unexpected server stoppage while redo logging is disabled is refused with the following error:

[ERROR] [MY-013578] [InnoDB] Server was killed when Innodb Redo
logging was disabled. Data files could be corrupt. You can try
to restart the database with innodb_force_recovery=6
In this case, initialize a new MySQL instance and start the data loading procedure again.
--------------------------------------------------

Following this idea, what do you think about adding a new value "none" to wal_level, where no WAL is generated? The setting of wal_level is recorded in pg_control. The startup process can see the value and reject recovery after abnormal shutdown, emitting a message similar to MySQL's.

Just a quick idea. I hope no devil will appear in the details.

Regards
Takayuki Tsunakawa

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-08-26 07:56:59 Re: Move OpenSSL random under USE_OPENSSL_RANDOM
Previous Message Masahiko Sawada 2020-08-26 06:54:58 Re: display offset along with block number in vacuum errors