Re: subscriptionCheck failures on nightjar

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: subscriptionCheck failures on nightjar
Date: 2019-02-13 17:11:01
Message-ID: 20190213171101.6wpz7tardp3t3uvk@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2019-02-13 11:57:32 -0500, Tom Lane wrote:
> I've managed to reproduce this locally, and obtained this PANIC:

Cool. How exactly?

Nice catch.

> Anyway, I think we might be able to fix this along the lines of
>
> CloseTransientFile(fd);
>
> + /* ensure snapshot file is down to stable storage */
> + fsync_fname(tmppath, false);
> fsync_fname("pg_logical/snapshots", true);
>
> /*
> * We may overwrite the work from some other backend, but that's ok, our
> * snapshot is valid as well, we'll just have done some superfluous work.
> */
> if (rename(tmppath, path) != 0)
> {
> ereport(ERROR,
> (errcode_for_file_access(),
> errmsg("could not rename file \"%s\" to \"%s\": %m",
> tmppath, path)));
> }
>
> - /* make sure we persist */
> - fsync_fname(path, false);
> + /* ensure we persist the file rename */
> fsync_fname("pg_logical/snapshots", true);

Hm, but that's not the same? On some filesystems one needs the directory
fsync, on some the file fsync, and I think both in some cases. ISTM we
should just open the file before the rename, and then use fsync() on the
filehandle rather than the filename. Then a concurrent rename ought not
to hurt us?

> The existing code here seems simply wacky/unsafe to me regardless
> of this race condition: couldn't it potentially result in a corrupt
> snapshot file appearing with a valid name, if the system crashes
> after persisting the rename but before it's pushed the data out?

What do you mean precisely with "before it's pushed the data out"?

> I also wonder why bother with the directory sync just before the
> rename.

Because on some FS/OS combinations the size of the renamed-into-place
file isn't guaranteed to be durable unless the directory was
fsynced. Otherwise there's the possibility to have incomplete data if we
crash after renaming, but before fsyncing.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-02-13 17:37:35 Re: subscriptionCheck failures on nightjar
Previous Message Magnus Hagander 2019-02-13 17:09:40 Re: Reaping Temp tables to avoid XID wraparound