Re: Disable WAL logging to speed up data loading

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: osumi(dot)takamichi(at)fujitsu(dot)com, magnus(at)hagander(dot)net, robertmhaas(at)gmail(dot)com, masao(dot)fujii(at)oss(dot)nttdata(dot)com, laurenz(dot)albe(at)cybertec(dot)at, tsunakawa(dot)takay(at)fujitsu(dot)com, ashutosh(dot)bapat(dot)oss(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Disable WAL logging to speed up data loading
Date: 2020-11-10 14:16:28
Message-ID: 20201110141628.GR16415@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Kyotaro Horiguchi (horikyota(dot)ntt(at)gmail(dot)com) wrote:
> At Mon, 9 Nov 2020 10:18:08 -0500, Stephen Frost <sfrost(at)snowman(dot)net> wrote in
> > Greetings,
> >
> > * osumi(dot)takamichi(at)fujitsu(dot)com (osumi(dot)takamichi(at)fujitsu(dot)com) wrote:
> > > When I consider the use case is the system of data warehouse
> > > as described upthread, the size of each table can be large.
> > > Thus, changing the status from unlogged to logged (recoverable)
> > > takes much time under the current circumstances, which was discussed before.
> >
> > Ok- so the issue is that, today, we dump all of the table into the WAL
> > when we go from unlogged to logged, but as I outlined previously,
> > perhaps that's because we're missing a trick there when
> > wal_level=minimal. If wal_level=minimal, then it would seem like we
> > could lock the table, then sync it and then mark is as logged, which is
>
> FWIW, the following is that, I think it works not only when
> wal_level=minimal for SET UNLOGGED, and only works when minimal for
> SET LOGGED.
>
> https://www.postgresql.org/message-id/20201002.100621.1668918756520136893.horikyota.ntt@gmail.com

Oh, nice! Very cool that this was already being worked on, apologies
for not seeing that thread.

> | For fuel(?) of the discussion, I tried a very-quick PoC for in-place
> | ALTER TABLE SET LOGGED/UNLOGGED and resulted as attached. After some
> | trials of several ways, I drifted to the following way after poking
> | several ways.
> |
> | 1. Flip BM_PERMANENT of active buffers
> | 2. adding/removing init fork
> | 3. sync files,
> | 4. Flip pg_class.relpersistence.
> |
> | It always skips table copy in the SET UNLOGGED case, and only when
> | wal_level=minimal in the SET LOGGED case. Crash recovery seems
> | working by some brief testing by hand.

Looking over that patch- isn't it missing a necessary FlushBuffers(), to
make sure that any dirty buffers have been written out before the sync
is being done?

Will try to find that thread and review/comment on it there too, if I
can, buf figured I'd point it out here while I have your attention too.

:)

> > more-or-less what you're asking to have be effectively done with the
> > proposed wal_level=none, but this would be an optimization for all
> > existing users of wal_level=minimal who have unlogged tables that they
> > want to change to logged, and this works on a per-table basis instead,
> > which seems like a better approach than a cluster-wide setting.
> >
> > > By having the limited window of time,
> > > during wal_level=none, I'd like to make wal_level=none work to
> > > localize and minimize the burden to guarantee all commands are
> > > repeatable. To achieve this, after switching wal_level from none to higher ones,
> > > the patch must ensure crash recovery, though.
> >
> > Perhaps a helper command could be added to ALTER TABLE ALL IN TABLESPACE
> > to marked a bunch of unlogged tables over to being logged would be good
> > to add too.
>
> I agree to this aspect of the in-place flipping of UNLOGGED.

Yeah, seems like it'd be useful and probably not hard to implement.

> > > Sorry that my current patch doesn't complete this aspect fully at present
> > > but, may I have your opinion about this ?
> >
> > Presently, my feeling is that we could address this use-case without
> > having to introduce a new cluster-wide WAL level, and that's the
> > direction I'd want to see this going. Perhaps I'm missing something
> > about why the approach I've set forth above wouldn't work, and
> > wal_level=none would, but I've not seen it yet.
>
> Couldn't we have something like the following?
>
> ALTER TABLE table1, table2, table3 SET UNLOGGED;
>
> That is, multiple target object specification in ALTER TABLE sttatement.

That requires knowing all the tables ahead of time though, or writing
some pl/pgsql to build up that list. Not hard in general, but having
the 'all in tablespace' would make it a bit easier if you wanted to do
this for effectively all tables in a given database. Having ALTER TABLE
support being given an explicit list of tables does seem like it could
be useful though, so certainly not against that if someone wanted to
implement it.

Thanks,

Stephen

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message gkokolatos 2020-11-10 14:19:06 Re: PATCH: Attempt to make dbsize a bit more consistent
Previous Message Amit Kapila 2020-11-10 13:57:58 Re: Parallel copy