Re: pg_stat_io not tracking smgrwriteback() is confusing

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>
Subject: Re: pg_stat_io not tracking smgrwriteback() is confusing
Date: 2023-04-24 21:37:48
Message-ID: 20230424213748.k6rpddvtjfsn5bfk@liskov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 24, 2023 at 02:14:32PM -0700, Andres Freund wrote:
> Hi,
>
> On 2023-04-24 16:39:36 -0400, Melanie Plageman wrote:
> > On Wed, Apr 19, 2023 at 10:23:26AM -0700, Andres Freund wrote:
> > > Hi,
> > >
> > > I noticed that the numbers in pg_stat_io dont't quite add up to what I
> > > expected in write heavy workloads. Particularly for checkpointer, the numbers
> > > for "write" in log_checkpoints output are larger than what is visible in
> > > pg_stat_io.
> > >
> > > That partially is because log_checkpoints' "write" covers way too many things,
> > > but there's an issue with pg_stat_io as well:
> > >
> > > Checkpoints, and some other sources of writes, will often end up doing a lot
> > > of smgrwriteback() calls - which pg_stat_io doesn't track. Nor do any
> > > pre-existing forms of IO statistics.
> > >
> > > It seems pretty clear that we should track writeback as well. I wonder if it's
> > > worth doing so for 16? It'd give a more complete picture that way. The
> > > counter-argument I see is that we didn't track the time for it in existing
> > > stats either, and that nobody complained - but I suspect that's mostly because
> > > nobody knew to look.
> >
> > Not complaining about making pg_stat_io more accurate, but what exactly
> > would we be tracking for smgrwriteback()? I assume you are talking about
> > IO timing. AFAICT, on Linux, it does sync_file_range() with
> > SYNC_FILE_RANGE_WRITE, which is asynchronous. Wouldn't we just be
> > tracking the system call overhead time?
>
> It starts blocking once "enough" IO is in flight. For things like an immediate
> checkpoint, that can happen fairly quickly, unless you have a very fast IO
> subsystem. So often it'll not matter whether we track smgrwriteback(), but
> when it matter, it can matter a lot.

I see. So, it sounds like this is most likely to happen for checkpointer
and not likely to happen for other backends who call
ScheduleBufferTagForWriteback(). Also, it seems like this (given the
current code) is only reachable for permanent relations (i.e. not for IO
object temp relation). If other backend types than checkpointer may call
smgrwriteback(), we likely have to consider the IO context. I would
imagine that we want to smgrwriteback() timing to writes/write time for
the relevant IO context and backend type.

- Melanie

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2023-04-24 21:57:57 Overhauling "Routine Vacuuming" docs, particularly its handling of freezing
Previous Message Tomas Vondra 2023-04-24 21:20:32 Re: Missing update of all_hasnulls in BRIN opclasses