RE: Disable WAL logging to speed up data loading

From: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
To: 'Fujii Masao' <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>
Cc: "ashutosh(dot)bapat(dot)oss(at)gmail(dot)com" <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Disable WAL logging to speed up data loading
Date: 2020-10-02 05:03:37
Message-ID: TYAPR01MB2990DA2A5FDA5B094CD708EBFE310@TYAPR01MB2990.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
> > The speedup has already been achieved with higher durability by
> > wal_level=minimal in that case.
>
> I was thinking the same, i.e., wal_level=minimal + wal_skip_threshold would
> speed up that initial data loading.

First of all, thank you Horiguchi-san for trying to improve ALTER TABLE SET UNLOGGED/LOGGED. That should also be appealing.

At the same time, as I said before, both features have good points. TBH, as a user, I'm kind of attracted by MySQL's approach because of its simplicity for users (although DBMS developers may be worried about this and that.) What tempts me is that I can just switch on the feature with a single configuration parameter, and continue to use existing SQL scripts and other data integration software without knowing what tables those load data into. In the same context, I don't have to add or delete ALTER TABLE statements when I have to change the set of tables to be loaded. For the same reason, I'm also interested in Oracle's another feature ALTER TABLESPACE LOGGING/NOLOGGING.

BTW, does ALTER TABLE LOGGED/UNLOGGED on a partitioned table get the change to its all partitions? It would be a bit tedious to add/delete ALTER TABLE LOGGED/UNLOGGED when I add/drop a partition.

Regarding data migration, data movement is not limited only to major upgrades. It will be convenient to speed up the migration of the entire database cluster into a new instance for testing and new deployment. (I'm not sure about recent pg_upgrade, but pg_upgrade sometimes cannot upgrade too older versions.)

To conclude, I hope both features will be realized, and wish we won't fall in a situation where the words fly such as "Mine is enough. Yours is risky and not necessary."

With that said, I think we may as well separate the thread some time later for CF entry. Otherwise, we will have trouble in finding the latest patch from the CF entry.

> No, I have no strong objection against your trial. But I was thinking
> that it's not so easy to design and implement wal_level=none.
> For example, there are some functions and commands depending on
> the existence of WAL, like pg_switch_wal(), PREPARE TRANSACTION
> and COMMIT PREPARED. Probably you need to define how they should
> work in wal_level=none, e.g., emit an error.

Yeah, we thought pg_switch_wal() may need some treatment. We'll check PREPARE and COMMIT PREPARED as well. I'd appreciate it if you share what you notice at any time. It is possible that we should emit WAL records of some resource managers, like the bootstrap mode emits WAL only for RM_XLOG_ID.

Regards
Takayuki Tsunakawa

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2020-10-02 05:07:52 Re: Disable WAL logging to speed up data loading
Previous Message Kyotaro Horiguchi 2020-10-02 04:51:35 Re: Disable WAL logging to speed up data loading