Re: Monitoring time of fsyncing WALs

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Monitoring time of fsyncing WALs
Date: 2018-07-01 03:29:27
Message-ID: 20180701032927.GA2109@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jun 29, 2018 at 10:39:00PM +0900, Michael Paquier wrote:
> On Fri, Jun 29, 2018 at 08:48:33AM -0400, Robert Haas wrote:
>> Are there other instances of fsync() that also need to be covered?
>
> Yeah, I double-checked the calls to pg_fsync back when I looked at this
> patch, but now that I look again I see at least two more of them:
> - fsync_fname_ext.

This one is used internally by things like durable_rename or such, on
which I am not sure that we actually can pass down correctly a wait
event as multiple syncs may happen within each call, so that would
require a weird API layer.

> - write_auto_conf_file, where a wait event for a write() is missing as
> well.

I have been looking at the archives and this has been left out on
purpose:
https://postgr.es/m/CAGPqQf0bzVfTTZdxcr4qHb3WDbn=+iH6sWchbN4HGjhwtcbPYQ@mail.gmail.com

> Hm. I am wondering if it would not be worth extending pg_fsync() with
> an argument for a wait event and introduce a sort of pg_write() which
> wraps write() with an extra wait event argument, and similarly for
> read() (warning, conflict with Windows ahead!). That's somehow related
> to the feeling I had when working with transient file's writes and reads
> a couple of days back. Those are most likely going to be forgotten
> again and again in the future. In both cases the caller would still be
> responsible for looking at errno and decide the error handling, but I
> got no feedback about the idea on transient files.

That would finish by being sort of ugly as durable_rename or such would
also need some treatment. That would be quite invasive.

> There are also a couple of places where the wait events are longer than
> they should. For example in writeTimeLineHistory, there is a wait event
> for write() which is still switched on even within an error code path.
> And on top of it I think that a call to pgstat_report_wait_end() is
> missing in the error code path as on ERROR the session still exists.
> That's a bug. Those need an extra lookup and visibly some cleanup back
> to v10.

On ERROR a backend would switch back quickly on ClientRead. Perhaps
we'd want to call pgstat_report_wait_end() when an error or higher is
thrown? That's a different discussion.

So at the end, I would like to use the proposed patch and call it a
day. Thoughts?
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2018-07-01 04:28:21 Re: Explain buffers wrong counter with parallel plans
Previous Message Tom Lane 2018-06-30 21:52:46 Re: Postgres 11 release notes