Re: Unlogged relations and WAL-logging

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Unlogged relations and WAL-logging
Date: 2022-01-28 13:57:42
Message-ID: CA+Tgmoa9SEbDAL4bn2Yrp9J1-M2Ce0uF3m_4iZE++yLpfRgoKw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 27, 2022 at 2:32 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> Unlogged relations are not WAL-logged, but creating the init-fork is.
> There are a few things around that seem sloppy:
>
> 1. In index_build(), we do this:
>
> > */
> > if (indexRelation->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED &&
> > !smgrexists(RelationGetSmgr(indexRelation), INIT_FORKNUM))
> > {
> > smgrcreate(RelationGetSmgr(indexRelation), INIT_FORKNUM, false);
> > indexRelation->rd_indam->ambuildempty(indexRelation);
> > }
>
> Shouldn't we call log_smgrcreate() here? Creating the init fork is
> otherwise not WAL-logged at all.

Yes, that's a bug.

> 2. Some implementations of ambuildempty() use the buffer cache (hash,
> gist, gin, brin), while others bypass it and call smgrimmedsync()
> instead (btree, spgist, bloom). I don't see any particular reason for
> those decisions, it seems to be based purely on which example the author
> happened to copy-paste.

I thought that this inconsistency was odd when I was developing the
unlogged feature, but I tried to keep each routine's ambuildempty()
consistent with whatever ambuild() was doing. I don't mind if you want
to change it, though.

> 3. Those ambuildempty implementations that bypass the buffer cache use
> smgrwrite() to write the pages. That doesn't make any difference in
> practice, but in principle it's wrong: You are supposed to use
> smgrextend() when extending a relation.

That's a mistake on my part.

> 4. Also, the smgrwrite() calls are performed before WAL-logging the
> pages, so the page that's written to disk has 0/0 as the LSN, not the
> LSN of the WAL record. That's harmless too, but seems a bit sloppy.

That is also a mistake on my part.

> 5. In heapam_relation_set_new_filenode(), we do this:
>
> >
> > /*
> > * If required, set up an init fork for an unlogged table so that it can
> > * be correctly reinitialized on restart. An immediate sync is required
> > * even if the page has been logged, because the write did not go through
> > * shared_buffers and therefore a concurrent checkpoint may have moved the
> > * redo pointer past our xlog record. Recovery may as well remove it
> > * while replaying, for example, XLOG_DBASE_CREATE or XLOG_TBLSPC_CREATE
> > * record. Therefore, logging is necessary even if wal_level=minimal.
> > */
> > if (persistence == RELPERSISTENCE_UNLOGGED)
> > {
> > Assert(rel->rd_rel->relkind == RELKIND_RELATION ||
> > rel->rd_rel->relkind == RELKIND_MATVIEW ||
> > rel->rd_rel->relkind == RELKIND_TOASTVALUE);
> > smgrcreate(srel, INIT_FORKNUM, false);
> > log_smgrcreate(newrnode, INIT_FORKNUM);
> > smgrimmedsync(srel, INIT_FORKNUM);
> > }
>
> The comment doesn't make much sense, we haven't written nor WAL-logged
> any page here, with nor without the buffer cache. It made more sense
> before commit fa0f466d53.

Well, it seems to me (and perhaps I am just confused) that complaining
that there's no page written here might be a technicality. The point
is that there's no synchronization between the work we're doing here
-- which is creating a fork, not writing a page -- and any concurrent
checkpoint. So we both need to log it, and also sync it immediately.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Denis Laxalde 2022-01-28 14:06:57 Re: [PATCH] Disable bgworkers during servers start in pg_upgrade
Previous Message Julien Rouhaud 2022-01-28 13:56:36 Re: [PATCH] Disable bgworkers during servers start in pg_upgrade