RE: Disable WAL logging to speed up data loading

From: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
To: 'Stephen Frost' <sfrost(at)snowman(dot)net>
Cc: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, 'David Steele' <david(at)pgmasters(dot)net>, "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, 'Kyotaro Horiguchi' <horikyota(dot)ntt(at)gmail(dot)com>, "sawada(dot)mshk(at)gmail(dot)com" <sawada(dot)mshk(at)gmail(dot)com>, "robertmhaas(at)gmail(dot)com" <robertmhaas(at)gmail(dot)com>, "masao(dot)fujii(at)oss(dot)nttdata(dot)com" <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, "ashutosh(dot)bapat(dot)oss(at)gmail(dot)com" <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Disable WAL logging to speed up data loading
Date: 2021-03-23 03:11:25
Message-ID: TYAPR01MB2990DA89E6ABF107B3AC3B89FE649@TYAPR01MB2990.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: Stephen Frost <sfrost(at)snowman(dot)net>
> First- what are you expecting would actually happen during crash recovery in
> this specific case with your proposed new WAL level?
...
> I'm not suggesting it's somehow more crash safe- but it's at least very clear
> what happens in such a case, to wit: the entire table is cleared on crash
> recovery.

As Laurenz-san kindly replied, the database server refuses to start with a clear message. So, it's similarly very clear what happens. The user will never unknowingly resume operation with possibly corrupt data.

> We're talking about two different ways to accomplish essentially the same
> thing- one which introduces a new WAL level, vs. one which adds an
> optimization for a WAL level we already have. That the second is more elegant
> is more-or-less entirely the point I'm making here, so it seems pretty relevant.

So, I understood the point boils down to elegance. Could I ask what makes you feel ALTER TABLE UNLOGGED/LOGGED is (more) elegant? I'm purely asking as a user.

(I don't want to digress, but if we consider the number of options for wal_level as an issue, I feel it's not elegant to have separate "replica" and "logical".)

> Under the proposed 'none', you basically have to throw out the entire cluster on
> a crash, all because you don't want to use 'UNLOGGED' when you created the
> tables you want to load data into, or 'TRUNCATE' them in the transaction where
> you start the data load, either of which gives us enough indication and which
> we have infrastructure around dealing with in the event of a crash during the
> load without everything else having to be tossed and everything restored from a
> backup. That's both a better user experience from the perspective of having
> fewer WAL levels to understand and from just a general administration
> perspective so you don't have to go all the way back to a backup to bring the
> system back up.

The elegance of wal_level = none is that the user doesn't have to remember to add ALTER TABLE to the data loading job when they add load target tables/partitions. If they build and use their own (shell) scripts to load data, that won't be burdon or forgotten. But what would they have to do when they use ETL tools like Talend, Pentaho, and Informatica Power Center? Do those tools allow users to add custom processing like ALTER TABLE to the data loading job steps for each table? (AFAIK, not.)

wal_level = none is convenient and attractive for users who can backup and restore the entire database instantly with a storage or filesystem snapshot feature.

Regards
Takayuki Tsunakawa

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhihong Yu 2021-03-23 03:18:56 Re: [POC] Fast COPY FROM command for the table with foreign partitions
Previous Message Fujii Masao 2021-03-23 02:40:26 Re: make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested.