Re: md: measure just FileSync() for pgstat_io without FileClose()

From: ZizhuanLiu X-MAN <44973863(at)qq(dot)com>
To: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
Cc: 我自己的邮箱 <44973863(at)qq(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: md: measure just FileSync() for pgstat_io without FileClose()
Date: 2026-06-09 03:01:52
Message-ID: tencent_D4BE9F2C49A8F71A85E1FFDA9A6971269009@qq.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>Original
>From: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
>Date: 2026-04-30 19:14
>To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
>Subject: md: measure just FileSync() for pgstat_io without FileClose()
>
>
>Hi,
>
>As?per?the?attachment,?shouldn't?we?exclude?the?timing?of?close()?from
>fsync()'s?duration?
>
>-J.

Hi Jakub,

Thanks a lot for your patch!

I agree that ideally the call to `pgstat_count_io_op_time(..., IOOP_FSYNC, ...)` inside `mdsyncfiletag()`
should only measure and accurately reflect the actual execution time of `FileSync()`.

However, I haven’t figured out a clean way to capture and retain the required timing metrics within
the existing logic of `mdsyncfiletag()`, so that we can feed them into `pgstat_count_io_op_time()` (or
its variants) for proper IO statistics tracking. Conceptually, the ideal flow would look like this:
```c
io_start = pgstat_prepare_io_time(track_io_timing);
result = FileSync(file, WAIT_EVENT_DATA_FILE_SYNC);
io_end = pgstat_get_io_time???();

if (need_to_close)
FileClose(file);
pgstat_count_io_op_time_v2(..., io_start, io_end, ...);
```

I’ve been thinking through the logic of `PathNameOpenFile()`, the subsequent `FileClose()` call,
and how this sequence impacts the fsync timing measurement inside `mdsyncfiletag()`.

Let’s break down all heavy or complex work done inside `FileClose()` for temporary file handles
opened and closed within `mdsyncfiletag()`:

1. AIO-related logic
Files opened via `PathNameOpenFile()` get flags `O_RDWR | PG_BINARY` (potentially combined
with `PG_O_DIRECT`). When we call `FileClose()`, it invokes `pgaio_closing_fd(vfdP->fd)`.

I’ll admit I haven’t fully wrapped my head around the full AIO subsystem, which is quite complex.
For these data file handles, we only pass `PG_O_DIRECT` at open time and do not leverage AIO
or io_uring for the I/O path. That leads me to suspect `pgaio_closing_fd()` will trigger no meaningful
AIO cleanup work here. This is especially relevant given the `while` loop inside `pgaio_closing_fd()` that
may invoke the potentially slow blocking wait `pgaio_io_wait()`.

My analysis on this part is tentative, and I welcome any corrections or further insights from the community.

2. Logic tied to `vfdP->fdstate`
When opening a file via `PathNameOpenFile()`, `vfdP->fdstate` is initialized to `0x0`.
During `FileClose()`, the checks `(vfdP->fdstate & FD_TEMP_FILE_LIMIT)` and `(vfdP->fdstate & FD_DELETE_AT_CLOSE)`
will both evaluate to false, so none of the associated cleanup routines for files will run.

3. Logic tied to `vfdP->resowner`
`PathNameOpenFile()` sets `vfdP->resowner = NULL`.
In turn, the conditional `if (vfdP->resowner)` inside `FileClose()` will not be satisfied,
meaning no resource owner cleanup logic will execute either.

To sum up: the file handles we temporarily open and close locally within `mdsyncfiletag()`
do not trigger any expensive cleanup paths in `FileClose()`. The overhead introduced by
closing these file descriptors should be negligible compared to the latency of the preceding `FileSync()` call.
I should note I have not yet run comprehensive benchmark tests to quantify this overhead
, so this is only a qualitative assessment.

From a broader system design perspective, closing the file descriptor ahead of emitting
IO statistics via `pgstat_count_io_op_time()` is also a reasonable performance optimization:
we release unused file resources as early as possible.

Happy to discuss this further, and I’m open to any feedback, alternative ideas or corrections you might have.

regards,
--
ZizhuanLiu (X-MAN)
44973863(at)qq(dot)com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message shveta malik 2026-06-09 03:48:19 Re: Proposal: Conflict log history table for Logical Replication
Previous Message Yugo Nagata 2026-06-09 02:23:32 Re: Why is the LSN reported for pg_logical_emit_message() different from other decoded operations?