Re: Tighten error control for OpenTransientFile/CloseTransientFile

From: Andres Freund <andres(at)anarazel(dot)de>
To: Michael Paquier <michael(at)paquier(dot)xyz>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Tighten error control for OpenTransientFile/CloseTransientFile
Date: 2019-10-04 20:39:38
Message-ID: 20191004203938.7a34jv4lnrkb56qu@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2019-03-07 10:56:25 +0900, Michael Paquier wrote:
> diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
> index f5cf9ffc9c..bce4274362 100644
> --- a/src/backend/access/heap/rewriteheap.c
> +++ b/src/backend/access/heap/rewriteheap.c
> @@ -1202,7 +1202,10 @@ heap_xlog_logical_rewrite(XLogReaderState *r)
> errmsg("could not fsync file \"%s\": %m", path)));
> pgstat_report_wait_end();
>
> - CloseTransientFile(fd);
> + if (CloseTransientFile(fd))
> + ereport(ERROR,
> + (errcode_for_file_access(),
> + errmsg("could not close file \"%s\": %m", path)));
> }
...
> diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
> index 64679dd2de..21986e48fe 100644
> --- a/src/backend/access/transam/twophase.c
> +++ b/src/backend/access/transam/twophase.c
> @@ -1297,7 +1297,11 @@ ReadTwoPhaseFile(TransactionId xid, bool missing_ok)
> }
>
> pgstat_report_wait_end();
> - CloseTransientFile(fd);
> +
> + if (CloseTransientFile(fd))
> + ereport(ERROR,
> + (errcode_for_file_access(),
> + errmsg("could not close file \"%s\": %m", path)));
>
> hdr = (TwoPhaseFileHeader *) buf;
> if (hdr->magic != TWOPHASE_MAGIC)
> diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
> index 0fdd82a287..c7047738b6 100644
> --- a/src/backend/access/transam/xlog.c
> +++ b/src/backend/access/transam/xlog.c
> @@ -3469,7 +3469,10 @@ XLogFileCopy(XLogSegNo destsegno, TimeLineID srcTLI, XLogSegNo srcsegno,
> (errcode_for_file_access(),
> errmsg("could not close file \"%s\": %m", tmppath)));
>
> - CloseTransientFile(srcfd);
> + if (CloseTransientFile(srcfd))
> + ereport(ERROR,
> + (errcode_for_file_access(),
> + errmsg("could not close file \"%s\": %m", path)));
>
> /*
> * Now move the segment into place with its final name.
...

This seems like an odd set of changes to me. What is this supposed to
buy us? The commit message says:
2) When opening transient files, it is up to the caller to close the
file descriptors opened. In error code paths, CloseTransientFile() gets
called to clean up things before issuing an error. However in normal
exit paths, a lot of callers of CloseTransientFile() never actually
reported errors, which could leave a file descriptor open without
knowing about it. This is an issue I complained about a couple of
times, but never had the courage to write and submit a patch, so here we
go.

but that reasoning seems bogus to me. For one, on just about any
platform close always closes the fd, even when returning an error
(unless you pass in a bad fd, in which case it obviously doesn't). So
the reasoning that this fixes unnoticed fd leaks doesn't really seem to
make sense. For another, even if it did, throwing an ERROR seems to
achieve very little: We continue with a leaked fd *AND* we cause the
operation to error out.

I can see reasoning for:
- LOG, so it can be noticed, but operations continue to work
- FATAL, to fix the leak
- PANIC, so we recover from the problem, in case of the close indicating
a durability issue

commit 9ccdd7f66e3324d2b6d3dec282cfa9ff084083f1
Author: Thomas Munro <tmunro(at)postgresql(dot)org>
Date: 2018-11-19 13:31:10 +1300

PANIC on fsync() failure.

but ERROR seems to have very little going for it.

The durability argument doesn't seem to apply for the cases where we
previously fsynced the file, a significant fraction of the locations you
touched.

And if your goal was just to achieve consistency, I also don't
understand, because you left plenty close()'s unchecked? E.g. you added
an error check in get_controlfile(), but not one in
ReadControlFile(). alterSystemSetConfigFile() writes, but you didn't add
one.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2019-10-04 20:43:38 Re: Proposal: Make use of C99 designated initialisers for nulls/values arrays
Previous Message Bruce Momjian 2019-10-04 20:31:29 Re: Proposal: Make use of C99 designated initialisers for nulls/values arrays