Re: In-place persistance change of a relation

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: osumi(dot)takamichi(at)fujitsu(dot)com, masao(dot)fujii(at)oss(dot)nttdata(dot)com, ashutosh(dot)bapat(dot)oss(at)gmail(dot)com, tsunakawa(dot)takay(at)fujitsu(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: In-place persistance change of a relation
Date: 2020-11-11 14:56:44
Message-ID: 20201111145644.GY16415@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Kyotaro Horiguchi (horikyota(dot)ntt(at)gmail(dot)com) wrote:
> At Tue, 10 Nov 2020 09:33:12 -0500, Stephen Frost <sfrost(at)snowman(dot)net> wrote in
> > * Kyotaro Horiguchi (horikyota(dot)ntt(at)gmail(dot)com) wrote:
> > > For fuel(?) of the discussion, I tried a very-quick PoC for in-place
> > > ALTER TABLE SET LOGGED/UNLOGGED and resulted as attached. After some
> > > trials of several ways, I drifted to the following way after poking
> > > several ways.
> > >
> > > 1. Flip BM_PERMANENT of active buffers
> > > 2. adding/removing init fork
> > > 3. sync files,
> > > 4. Flip pg_class.relpersistence.
> > >
> > > It always skips table copy in the SET UNLOGGED case, and only when
> > > wal_level=minimal in the SET LOGGED case. Crash recovery seems
> > > working by some brief testing by hand.
> >
> > Somehow missed that this patch more-or-less does what I was referring to
> > down-thread, but I did want to mention that it looks like it's missing a
> > necessary FlushRelationBuffers() call before the sync, otherwise there
> > could be dirty buffers for the relation that's being set to LOGGED (with
> > wal_level=minimal), which wouldn't be good. See the comments above
> > smgrimmedsync().
>
> Right. Thanks. However, since SetRelFileNodeBuffersPersistence()
> called just above scans shared buffers so I don't want to just call
> FlushRelationBuffers() separately. Instead, I added buffer-flush to
> SetRelFileNodeBuffersPersistence().

Maybe I'm missing something, but it sure looks like in the patch that
SetRelFileNodeBuffersPersistence() is being called after the
smgrimmedsync() call, and I don't think you get to just switch the order
of those- the sync is telling the kernel to make sure it's written to
disk, while the FlushBuffer() is just writing it into the kernel but
doesn't provide any guarantee that the data has actually made it to
disk. We have to FlushBuffer() first, and then call smgrimmedsync().
Perhaps there's a way to avoid having to go through shared buffers
twice, and I generally agreed it'd be good if we could avoid doing so,
but this approach doesn't look like it actually works.

> FWIW this is a revised version of the PoC, which has some known
> problems.
>
> - Flipping of Buffer persistence is not WAL-logged nor even be able to
> be safely roll-backed. (It might be better to drop buffers).

Not sure if it'd be better to drop buffers or not, but figuring out how
to deal with rollback seems pretty important. How is the persistence
change in the catalog not WAL-logged though..?

> - This version handles indexes but not yet handle toast relatins.

Would need to be fixed, of course.

> - tableAMs are supposed to support this feature. (but I'm not sure
> it's worth allowing them not to do so).

Seems like they should.

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2020-11-11 15:17:36 Re: proposal: possibility to read dumped table's name from file
Previous Message Dilip Kumar 2020-11-11 14:39:37 Re: [HACKERS] Custom compression methods