Re: Disable WAL logging to speed up data loading

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: sfrost(at)snowman(dot)net
Cc: osumi(dot)takamichi(at)fujitsu(dot)com, magnus(at)hagander(dot)net, robertmhaas(at)gmail(dot)com, masao(dot)fujii(at)oss(dot)nttdata(dot)com, laurenz(dot)albe(at)cybertec(dot)at, tsunakawa(dot)takay(at)fujitsu(dot)com, ashutosh(dot)bapat(dot)oss(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Disable WAL logging to speed up data loading
Date: 2020-11-10 00:59:41
Message-ID: 20201110.095941.1584254872856462674.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Mon, 9 Nov 2020 10:18:08 -0500, Stephen Frost <sfrost(at)snowman(dot)net> wrote in
> Greetings,
>
> * osumi(dot)takamichi(at)fujitsu(dot)com (osumi(dot)takamichi(at)fujitsu(dot)com) wrote:
> > When I consider the use case is the system of data warehouse
> > as described upthread, the size of each table can be large.
> > Thus, changing the status from unlogged to logged (recoverable)
> > takes much time under the current circumstances, which was discussed before.
>
> Ok- so the issue is that, today, we dump all of the table into the WAL
> when we go from unlogged to logged, but as I outlined previously,
> perhaps that's because we're missing a trick there when
> wal_level=minimal. If wal_level=minimal, then it would seem like we
> could lock the table, then sync it and then mark is as logged, which is

FWIW, the following is that, I think it works not only when
wal_level=minimal for SET UNLOGGED, and only works when minimal for
SET LOGGED.

https://www.postgresql.org/message-id/20201002.100621.1668918756520136893.horikyota.ntt@gmail.com

| For fuel(?) of the discussion, I tried a very-quick PoC for in-place
| ALTER TABLE SET LOGGED/UNLOGGED and resulted as attached. After some
| trials of several ways, I drifted to the following way after poking
| several ways.
|
| 1. Flip BM_PERMANENT of active buffers
| 2. adding/removing init fork
| 3. sync files,
| 4. Flip pg_class.relpersistence.
|
| It always skips table copy in the SET UNLOGGED case, and only when
| wal_level=minimal in the SET LOGGED case. Crash recovery seems
| working by some brief testing by hand.

> more-or-less what you're asking to have be effectively done with the
> proposed wal_level=none, but this would be an optimization for all
> existing users of wal_level=minimal who have unlogged tables that they
> want to change to logged, and this works on a per-table basis instead,
> which seems like a better approach than a cluster-wide setting.
>
> > By having the limited window of time,
> > during wal_level=none, I'd like to make wal_level=none work to
> > localize and minimize the burden to guarantee all commands are
> > repeatable. To achieve this, after switching wal_level from none to higher ones,
> > the patch must ensure crash recovery, though.
>
> Perhaps a helper command could be added to ALTER TABLE ALL IN TABLESPACE
> to marked a bunch of unlogged tables over to being logged would be good
> to add too.

I agree to this aspect of the in-place flipping of UNLOGGED.

> > Sorry that my current patch doesn't complete this aspect fully at present
> > but, may I have your opinion about this ?
>
> Presently, my feeling is that we could address this use-case without
> having to introduce a new cluster-wide WAL level, and that's the
> direction I'd want to see this going. Perhaps I'm missing something
> about why the approach I've set forth above wouldn't work, and
> wal_level=none would, but I've not seen it yet.

Couldn't we have something like the following?

ALTER TABLE table1, table2, table3 SET UNLOGGED;

That is, multiple target object specification in ALTER TABLE sttatement.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2020-11-10 01:10:23 Adding an aminsert() hint that triggers bottom-up index deletion for UPDATEs that can't use HOT
Previous Message tsunakawa.takay@fujitsu.com 2020-11-10 00:45:50 RE: POC: postgres_fdw insert batching