Re: Disable WAL logging to speed up data loading

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: masao(dot)fujii(at)oss(dot)nttdata(dot)com
Cc: osumi(dot)takamichi(at)fujitsu(dot)com, ashutosh(dot)bapat(dot)oss(at)gmail(dot)com, tsunakawa(dot)takay(at)fujitsu(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Disable WAL logging to speed up data loading
Date: 2020-10-02 04:38:22
Message-ID: 20201002.133822.2093448776724403352.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Fri, 2 Oct 2020 10:56:21 +0900, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote in
>
> On 2020/10/02 10:06, Kyotaro Horiguchi wrote:
> > At Thu, 1 Oct 2020 08:14:42 +0000, "osumi(dot)takamichi(at)fujitsu(dot)com"
> > <osumi(dot)takamichi(at)fujitsu(dot)com> wrote in
> >> When I compare the 2 ideas,
> >> one of the benefits of this ALTER TABLE 's improvement
> >> is that we can't avoid the downtime
> >> while that of wal_level='none' provides an easy and faster
> >> major version up via output file of pg_dumpall.
> > The speedup has already been achieved with higher durability by
> > wal_level=minimal in that case.
>
> I was thinking the same, i.e., wal_level=minimal + wal_skip_threshold
> would
> speed up that initial data loading.

Yeah.

> > Or maybe you should consider using
> > pg_upgrade instead. Even inducing the time to take a backup copy of
> > the whole cluster, running pg_upgrade would be far faster than
> > pg_dumpall then loading.
> >
> >> Both ideas have good points.
> >> However, actually to modify ALTER TABLE's copy
> >> looks far more difficult than wal_level='none' and
> >> beyond my current ability.
> >> So, I'd like to go forward with the direction of wal_level='none'.
> >> Did you have strong objections for this direction ?
>
> No, I have no strong objection against your trial. But I was thinking
> that it's not so easy to design and implement wal_level=none.
> For example, there are some functions and commands depending on
> the existence of WAL, like pg_switch_wal(), PREPARE TRANSACTION
> and COMMIT PREPARED. Probably you need to define how they should
> work in wal_level=none, e.g., emit an error.
>
>
> > For fuel(?) of the discussion, I tried a very-quick PoC for in-place
> > ALTER TABLE SET LOGGED/UNLOGGED and resulted as attached. After some
> > trials of several ways, I drifted to the following way after poking
> > several ways.
>
> Nice!
>
>
> > 1. Flip BM_PERMANENT of active buffers
> > 2. adding/removing init fork
> > 3. sync files,
> > 4. Flip pg_class.relpersistence.
> > It always skips table copy in the SET UNLOGGED case,
>
> Even in wal_level != minimal?
> What happens in the standby side when SET UNLOGGED is executed without
> the table rewrite in the primary? The table data should be truncated
> in the standby?

A table turned into unlogged on the primary is also turned into
unlogged on the standby and it is inaccessible on the standby. Maybe
the storage is dropped on both patched and unpatched versoins.

After the table is again turned into logged, the content is
transferred via WAL records generated from the insertions into the new
storage and it rebuilds the same storage on the standby on both
patched and unpatched.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2020-10-02 04:47:37 Re: [Patch] Optimize dropping of relation buffers using dlist
Previous Message Masahiro Ikeda 2020-10-02 03:40:58 Re: New statistics for tuning WAL buffer size