Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Lukas Fittl <lukas(at)fittl(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Date: 2023-01-17 22:00:34
Message-ID: CAAKRu_YJpZ_fV_n5PdOLWp9NcDZP2n=OZyLSUQRwBCz0iYpK_g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

v49 attached

On Tue, Jan 17, 2023 at 2:12 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2023-01-17 12:22:14 -0500, Melanie Plageman wrote:
>
> > > > +typedef struct PgStat_BackendIO
> > > > +{
> > > > + PgStat_Counter data[IOCONTEXT_NUM_TYPES][IOOBJECT_NUM_TYPES][IOOP_NUM_TYPES];
> > > > +} PgStat_BackendIO;
> > >
> > > Would it bother you if we swapped the order of iocontext and iobject here and
> > > related places? It makes more sense to me semantically, and should now be
> > > pretty easy, code wise.
> >
> > So, thinking about this I started noticing inconsistencies in other
> > areas around this order:
> > For example: ordering of objects mentioned in commit messages and comments,
> > ordering of parameters (like in pgstat_count_io_op() [currently in
> > reverse order]).
> >
> > I think we should make a final decision about this ordering and then
> > make everywhere consistent (including ordering in the view).
> >
> > Currently the order is:
> > BackendType
> > IOContext
> > IOObject
> > IOOp
> >
> > You are suggesting this order:
> > BackendType
> > IOObject
> > IOContext
> > IOOp
> >
> > Could you explain what you find more natural about this ordering (as I
> > find the other more natural)?
>
> The object we're performing IO on determines more things than the context. So
> it just seems like the natural hierarchical fit. The context is a sub-category
> of the object. Consider how it'll look like if we also have objects for 'wal',
> 'temp files'. It'll make sense to group by just the object, but it won't make
> sense to group by just the context.
>
> If it were trivial to do I'd use a different IOContext for each IOObject. But
> it'd make it much harder. So there'll just be a bunch of values of IOContext
> that'll only be used for one or a subset of the IOObjects.
>
>
> The reason to put BackendType at the top is pragmatic - one backend is of a
> single type, but can do IO for all kinds of objects/contexts. So any other
> hierarchy would make the locking etc much harder.
>
>
> > This is one possible natural sentence with these objects:
> >
> > During COPY, a client backend may read in data from a permanent
> > relation.
> > This order is:
> > IOContext
> > BackendType
> > IOOp
> > IOObject
> >
> > I think English sentences are often structured subject, verb, object --
> > but in our case, we have an extra thing that doesn't fit neatly
> > (IOContext).
>
> "..., to avoid polluting the buffer cache it uses the bulk (read|write)
> strategy".
>
>
> > Also, IOOp in a sentence would be in the middle (as the
> > verb). I made it last because a) it feels like the smallest unit b) it
> > would make the code a lot more annoying if it wasn't last.
>
> Yea, I think pragmatically that is the right choice.

I have changed the order and updated all the places using
PgStat_BktypeIO as well as in all locations in which it should be
ordered for consistency (that I could find in the pass I did) -- e.g.
the view definition, function signatures, comments, commit messages,
etc.

> > > > +-- Change the tablespace so that the table is rewritten directly, then SELECT
> > > > +-- from it to cause it to be read back into shared buffers.
> > > > +SET allow_in_place_tablespaces = true;
> > > > +CREATE TABLESPACE regress_io_stats_tblspc LOCATION '';
> > >
> > > Perhaps worth doing this in tablespace.sql, to avoid the additional
> > > checkpoints done as part of CREATE/DROP TABLESPACE?
> > >
> > > Or, at least combine this with the CHECKPOINTs above?
> >
> > I see a checkpoint is requested when dropping the tablespace if not all
> > the files in it are deleted. It seems like if the DROP TABLE for the
> > permanent table is before the explicit checkpoints in the test, then the
> > DROP TABLESPACE will not cause an additional checkpoint.
>
> Unfortunately, that's not how it works :(. See the comment above mdunlink():
>
> > * For regular relations, we don't unlink the first segment file of the rel,
> > * but just truncate it to zero length, and record a request to unlink it after
> > * the next checkpoint. Additional segments can be unlinked immediately,
> > * however. Leaving the empty file in place prevents that relfilenumber
> > * from being reused. The scenario this protects us from is:
> > ...
>
>
> > Is this what you are suggesting? Dropping the temporary table should not
> > have an effect on this.
>
> I was wondering about simply moving that portion of the test to
> tablespace.sql, where we already created a tablespace.
>
>
> An alternative would be to propose splitting tablespace.sql into one portion
> running at the start of parallel_schedule, and one at the end. Historically,
> we needed tablespace.sql to be optional due to causing problems when
> replicating to another instance on the same machine, but now we have
> allow_in_place_tablespaces.

It seems like the best way would be to split up the tablespace test file
as you suggested and drop the tablespace at the end of the regression
test suite. There could be other tests that could use a tablespace.
Though what I wrote is kind of tablespace test coverage, if this
rewriting behavior no longer happened when doing alter table set
tablespace, we would want to come up with a new test which exercised
that code to count those IO stats, not simply delete it from the
tablespace tests.

> > > SELECT pg_relation_size('test_io_local') / current_setting('block_size')::int8 > 100;
> > >
> > > Better toast compression or such could easily make test_io_local smaller than
> > > it's today. Seeing that it's too small would make it easier to understand the
> > > failure.
> >
> > Good idea. So, I used pg_table_size() because it seems like
> > pg_relation_size() does not include the toast relations. However, I'm
> > not sure this is a good idea, because pg_table_size() includes FSM and
> > visibility map. Should I write a query to get the toast relation name
> > and add pg_relation_size() of that relation and the main relation?
>
> I think it's the right thing to just include the relation size. Your queries
> IIRC won't use the toast table or other forks. So I'd leave it at just
> pg_relation_size().

I did notice that this test wasn't using the toast table for the
toastable column -- but you mentioned better toast compression affecting
the future test stability, so I'm confused.

- Melanie

Attachment Content-Type Size
v49-0001-pgstat-Infrastructure-to-track-IO-operations.patch text/x-patch 28.6 KB
v49-0004-pg_stat_io-documentation.patch text/x-patch 13.6 KB
v49-0002-pgstat-Count-IO-for-relations.patch text/x-patch 22.2 KB
v49-0003-Add-system-view-tracking-IO-ops-per-backend-type.patch text/x-patch 32.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2023-01-17 22:03:54 Re: Update comments in multixact.c
Previous Message Bruce Momjian 2023-01-17 21:16:20 Re: document the need to analyze partitioned tables