Re: In-placre persistance change of a relation

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: Jakub(dot)Wartak(at)tomtom(dot)com
Cc: tsunakawa(dot)takay(at)fujitsu(dot)com, osumi(dot)takamichi(at)fujitsu(dot)com, sfrost(at)snowman(dot)net, masao(dot)fujii(at)oss(dot)nttdata(dot)com, ashutosh(dot)bapat(dot)oss(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: In-placre persistance change of a relation
Date: 2021-12-20 06:28:23
Message-ID: 20211220.152823.1386247767464518843.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, Jakub.

At Fri, 17 Dec 2021 09:10:30 +0000, Jakub Wartak <Jakub(dot)Wartak(at)tomtom(dot)com> wrote in
> the patch didn't apply clean (it's from March; some hunks were failing), so I've fixed it and the combined git format-patch is attached. It did conflict with the following:

Thanks for looking this. Also thanks for Justin for finding the silly
use-after-free bug. (Now I see the regression test fails and I'm not
sure how come I didn't find this before.)

> b0483263dda - Add support for SET ACCESS METHOD in ALTER TABLE
> 7b565843a94 - Add call to object access hook at the end of table rewrite in ALTER TABLE
> 9ce346eabf3 - Report progress of startup operations that take a long time.
> f10f0ae420 - Replace RelationOpenSmgr() with RelationGetSmgr().
>
> I'm especially worried if I didn't screw up something/forgot something related to the last one (rd->rd_smgr changes), but I'm getting "All 210 tests passed".

About the last one, all rel->rd_smgr acesses need to be repalced with
RelationGetSmgr(). On the other hand we can simply remove
RelationOpenSmgr() calls since the target smgrrelation is guaranteed
to be loaded by RelationGetSmgr().

The fix you made for RelationCreate/DropInitFork is correct and
changes you made would work, but I prefer that the code not being too
permissive for unknown (or unexpected) states.

> Basic demonstration of this patch (with wal_level=minimal):
> create unlogged table t6 (id bigint, t text);
> -- produces 110GB table, takes ~5mins
> insert into t6 select nextval('s1'), repeat('A', 1000) from generate_series(1, 100000000);
> alter table t6 set logged;
> on baseline SET LOGGED takes: ~7min10s
> on patched SET LOGGED takes: 25s
>
> So basically one can - thanks to this patch - use his application (performing classic INSERTs/UPDATEs/DELETEs, so without the need to rewrite to use COPY) and perform literally batch upload and then just switch the tables to LOGGED.

This result is significant. That operation finally requires WAL writes
but I was not sure how much gain FPIs (or bulk WAL logging) gives in
comparison to operational WALs.

> Some more intensive testing also looks good, assuming table prepared to put pressure on WAL:
> create unlogged table t_unlogged (id bigint, t text) partition by hash (id);
> create unlogged table t_unlogged_h0 partition of t_unlogged FOR VALUES WITH (modulus 4, remainder 0);
> [..]
> create unlogged table t_unlogged_h3 partition of t_unlogged FOR VALUES WITH (modulus 4, remainder 3);
>
> Workload would still be pretty heavy on LWLock/BufferContent,WALInsert and Lock/extend .
> t_logged.sql = insert into t_logged select nextval('s1'), repeat('A', 1000) from generate_series(1, 1000); # according to pg_wal_stats.wal_bytes generates ~1MB of WAL
> t_unlogged.sql = insert into t_unlogged select nextval('s1'), repeat('A', 1000) from generate_series(1, 1000); # according to pg_wal_stats.wal_bytes generates ~3kB of WAL
>
> so using: pgbench -f <tabletypetest>.sql -T 30 -P 1 -c 32 -j 3 t
> with synchronous_commit =ON(default):
> with t_logged.sql: tps = 229 (lat avg = 138ms)
> with t_unlogged.sql tps = 283 (lat avg = 112ms) # almost all on LWLock/WALWrite
> with synchronous_commit =OFF:
> with t_logged.sql: tps = 413 (lat avg = 77ms)
> with t_unloged.sql: tps = 782 (lat avg = 40ms)
> Afterwards switching the unlogged ~16GB partitions takes 5s per partition.
>
> As the thread didn't get a lot of traction, I've registered it into current commitfest https://commitfest.postgresql.org/36/3461/ with You as the author and in 'Ready for review' state.
>
> I think it behaves as almost finished one and apparently after reading all those discussions that go back over 10years+ time span about this feature, and lot of failed effort towards wal_level=noWAL I think it would be nice to finally start getting some of that of it into the core.

Thanks for taking the performance benchmark.

I didn't register for some reasons.

1. I'm not sure that we want to have the new mark files.

2. Aside of possible bugs, I'm not confident that the crash-safety of
this patch is actually water-tight. At least we need tests for some
failure cases.

3. As mentioned in transam/README, failure in removing smgr mark files
leads to immediate shut down. I'm not sure this behavior is acceptable.

4. Including the reasons above, this is not fully functionally.
For example, if we execute the following commands on primary,
replica dones't work correctly. (boom!)

=# CREATE UNLOGGED TABLE t (a int);
=# ALTER TABLE t SET LOGGED;

The following fixes are done in the attched v8.

- Rebased. Referring to Jakub and Justin's work, I replaced direct
access to ->rd_smgr with RelationGetSmgr() and removed calls to
RelationOpenSmgr(). I still separate the "ALTER TABLE ALL IN
TABLESPACE SET LOGGED/UNLOGGED" statement part.

- Fixed RelationCreate/DropInitFork's behavior for non-target
relations. (From Jakub's work).

- Fixed wording of some comments.

- As revisited, I found a bug around recovery. If the logged-ness of a
relation gets flipped repeatedly in a transaction, duplicate
pending-delete entries are accumulated during recovery and work in a
wrong way. sgmr_redo now adds up to one entry for a action.

- The issue 4 above is not fixed (yet).

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
v8-0001-In-place-table-persistence-change.patch text/x-patch 74.9 KB
v8-0002-New-command-ALTER-TABLE-ALL-IN-TABLESPACE-SET-LOG.patch text/x-patch 11.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message houzj.fnst@fujitsu.com 2021-12-20 06:45:33 relcache not invalidated when ADD PRIMARY KEY USING INDEX
Previous Message Amit Langote 2021-12-20 06:20:28 Re: simplifying foreign key/RI checks