Re: Unlogged relation copy is not fsync'd

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Unlogged relation copy is not fsync'd
Date: 2023-09-04 13:59:14
Message-ID: CAAKRu_b=mvoY+mAnVgK4FDy+BquLMTFxhcyAa8sd9FmPnfHxGA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Aug 25, 2023 at 8:47 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
> I noticed another missing fsync() with unlogged tables, similar to the
> one at [1].
>
> RelationCopyStorage does this:
>
> > /*
> > * When we WAL-logged rel pages, we must nonetheless fsync them. The
> > * reason is that since we're copying outside shared buffers, a CHECKPOINT
> > * occurring during the copy has no way to flush the previously written
> > * data to disk (indeed it won't know the new rel even exists). A crash
> > * later on would replay WAL from the checkpoint, therefore it wouldn't
> > * replay our earlier WAL entries. If we do not fsync those pages here,
> > * they might still not be on disk when the crash occurs.
> > */
> > if (use_wal || copying_initfork)
> > smgrimmedsync(dst, forkNum);
>
> That 'copying_initfork' condition is wrong. The first hint of that is
> that 'use_wal' is always true for an init fork. I believe this was meant
> to check for 'relpersistence == RELPERSISTENCE_UNLOGGED'. Otherwise,
> this bad thing can happen:
>
> 1. Create an unlogged table
> 2. ALTER TABLE unlogged_tbl SET TABLESPACE ... -- This calls
> RelationCopyStorage
> 3. a checkpoint happens while the command is running
> 4. After the ALTER TABLE has finished, shut down postgres cleanly.
> 5. Lose power
>
> When you recover, the unlogged table is not reset, because it was a
> clean postgres shutdown. But the part of the file that was copied after
> the checkpoint at step 3 was never fsync'd, so part of the file is lost.
> I was able to reproduce with a VM that I kill to simulate step 5.
>
> This is the same scenario that the smgrimmedsync() call above protects
> from for WAL-logged relations. But we got the condition wrong: instead
> of worrying about the init fork, we need to call smgrimmedsync() for all
> *other* forks of an unlogged relation.
>
> Fortunately the fix is trivial, see attached. I suspect we have similar
> problems in a few other places, though. end_heap_rewrite(), _bt_load(),
> and gist_indexsortbuild have a similar-looking smgrimmedsync() calls,
> with no consideration for unlogged relations at all. I haven't tested
> those, but they look wrong to me.

The patch looks reasonable to me. Is this [1] case in hash index build
that I reported but didn't take the time to reproduce similar?

- Melanie

[1] https://www.postgresql.org/message-id/CAAKRu_bPc81M121pOEU7W%3D%2BwSWEebiLnrie4NpaFC%2BkWATFtSA%40mail.gmail.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2023-09-04 14:19:37 Re: Commitfest 2023-09 starts soon
Previous Message Aleksander Alekseev 2023-09-04 13:22:40 Re: Commitfest 2023-09 starts soon