| From: | ZizhuanLiu X-MAN <44973863(at)qq(dot)com> |
|---|---|
| To: | Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com> |
| Cc: | 我自己的邮箱 <44973863(at)qq(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: md: measure just FileSync() for pgstat_io without FileClose() |
| Date: | 2026-06-09 03:01:52 |
| Message-ID: | tencent_D4BE9F2C49A8F71A85E1FFDA9A6971269009@qq.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
>Original
>From: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
>Date: 2026-04-30 19:14
>To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
>Subject: md: measure just FileSync() for pgstat_io without FileClose()
>
>
>Hi,
>
>As?per?the?attachment,?shouldn't?we?exclude?the?timing?of?close()?from
>fsync()'s?duration?
>
>-J.
Hi Jakub,
Thanks a lot for your patch!
I agree that ideally the call to `pgstat_count_io_op_time(..., IOOP_FSYNC, ...)` inside `mdsyncfiletag()`
should only measure and accurately reflect the actual execution time of `FileSync()`.
However, I haven’t figured out a clean way to capture and retain the required timing metrics within
the existing logic of `mdsyncfiletag()`, so that we can feed them into `pgstat_count_io_op_time()` (or
its variants) for proper IO statistics tracking. Conceptually, the ideal flow would look like this:
```c
io_start = pgstat_prepare_io_time(track_io_timing);
result = FileSync(file, WAIT_EVENT_DATA_FILE_SYNC);
io_end = pgstat_get_io_time???();
if (need_to_close)
FileClose(file);
pgstat_count_io_op_time_v2(..., io_start, io_end, ...);
```
I’ve been thinking through the logic of `PathNameOpenFile()`, the subsequent `FileClose()` call,
and how this sequence impacts the fsync timing measurement inside `mdsyncfiletag()`.
Let’s break down all heavy or complex work done inside `FileClose()` for temporary file handles
opened and closed within `mdsyncfiletag()`:
1. AIO-related logic
Files opened via `PathNameOpenFile()` get flags `O_RDWR | PG_BINARY` (potentially combined
with `PG_O_DIRECT`). When we call `FileClose()`, it invokes `pgaio_closing_fd(vfdP->fd)`.
I’ll admit I haven’t fully wrapped my head around the full AIO subsystem, which is quite complex.
For these data file handles, we only pass `PG_O_DIRECT` at open time and do not leverage AIO
or io_uring for the I/O path. That leads me to suspect `pgaio_closing_fd()` will trigger no meaningful
AIO cleanup work here. This is especially relevant given the `while` loop inside `pgaio_closing_fd()` that
may invoke the potentially slow blocking wait `pgaio_io_wait()`.
My analysis on this part is tentative, and I welcome any corrections or further insights from the community.
2. Logic tied to `vfdP->fdstate`
When opening a file via `PathNameOpenFile()`, `vfdP->fdstate` is initialized to `0x0`.
During `FileClose()`, the checks `(vfdP->fdstate & FD_TEMP_FILE_LIMIT)` and `(vfdP->fdstate & FD_DELETE_AT_CLOSE)`
will both evaluate to false, so none of the associated cleanup routines for files will run.
3. Logic tied to `vfdP->resowner`
`PathNameOpenFile()` sets `vfdP->resowner = NULL`.
In turn, the conditional `if (vfdP->resowner)` inside `FileClose()` will not be satisfied,
meaning no resource owner cleanup logic will execute either.
To sum up: the file handles we temporarily open and close locally within `mdsyncfiletag()`
do not trigger any expensive cleanup paths in `FileClose()`. The overhead introduced by
closing these file descriptors should be negligible compared to the latency of the preceding `FileSync()` call.
I should note I have not yet run comprehensive benchmark tests to quantify this overhead
, so this is only a qualitative assessment.
From a broader system design perspective, closing the file descriptor ahead of emitting
IO statistics via `pgstat_count_io_op_time()` is also a reasonable performance optimization:
we release unused file resources as early as possible.
Happy to discuss this further, and I’m open to any feedback, alternative ideas or corrections you might have.
regards,
--
ZizhuanLiu (X-MAN)
44973863(at)qq(dot)com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | shveta malik | 2026-06-09 03:48:19 | Re: Proposal: Conflict log history table for Logical Replication |
| Previous Message | Yugo Nagata | 2026-06-09 02:23:32 | Re: Why is the LSN reported for pg_logical_emit_message() different from other decoded operations? |