Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

From: Andres Freund <andres(at)anarazel(dot)de>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Melanie Plageman <melanieplageman(at)gmail(dot)com>, Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com>, Lukas Fittl <lukas(at)fittl(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Date: 2022-11-25 22:46:11
Message-ID: 20221125224611.efs6mtrijedkjhzs@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-11-22 23:43:29 -0600, Justin Pryzby wrote:
> I think there may be a problem/deficiency with hint bits:
>
> |postgres=# DROP TABLE u2; CREATE TABLE u2 AS SELECT generate_series(1,999999)a; SELECT pg_stat_reset_shared('io'); explain (analyze,buffers) SELECT * FROM u2;
> |...
> | Seq Scan on u2 (cost=0.00..15708.75 rows=1128375 width=4) (actual time=0.111..458.239 rows=999999 loops=1)
> | Buffers: shared hit=2048 read=2377 dirtied=2377 written=2345
>
> |postgres=# SELECT COUNT(1), relname, COUNT(1) FILTER(WHERE isdirty) FROM pg_buffercache b LEFT JOIN pg_class c ON pg_relation_filenode(c.oid)=b.relfilenode GROUP BY 2 ORDER BY 1 DESC LIMIT 11;
> | count | relname | count
> |-------+---------------------------------+-------
> | 13619 | | 0
> | 2080 | u2 | 2080
> | 104 | pg_attribute | 4
> | 71 | pg_statistic | 1
> | 51 | pg_class | 1
>
> It says that SELECT caused 2377 buffers to be dirtied, of which 2080 are
> associated with the new table in pg_buffercache.

Note that there's 2048 dirty buffers for u2 in shared_buffers before the
SELECT, despite the relation being 4425 blocks long, due to the CTAS using
BAS_BULKWRITE.

> |postgres=# SELECT * FROM pg_stat_io WHERE backend_type!~'autovac|archiver|logger|standalone|startup|^wal|background worker' or true ORDER BY 2;
> | backend_type | io_context | io_object | read | written | extended | op_bytes | evicted | reused | files_synced | stats_reset
> |...
> | client backend | bulkread | relation | 2377 | 2345 | | 8192 | 0 | 2345 | | 2022-11-22 22:32:33.044552-06
>
> I think it's a known behavior that hint bits do not use the strategy
> ring buffer. For BAS_BULKREAD, ring_size = 256kB (32, 8kB pages), but
> there's 2080 dirty pages in the buffercache (~16MB).

I don't think there's any "circumvention" of the ringbuffer here. There's 2048
buffers for u2 in s_b before, all dirty, there's 2080 after, also all
dirty. So the ringbuffer restricted the increase in shared buffers used for u2
to 2080-2048=32 additional buffers.

The reason hint bits don't prevent pages from being written out here is that a
BAS_BULKREAD strategy doesn't cause all buffer writes to be rejected, it just
causes buffer writes to be rejected when the page LSN would require a WAL
flush. And that's not typically the case when you just set a hint bit, unless
you use wal_log_hint_bits = true.

If I turn on wal_log_hints=true and add a CHECKPOINT after the CTAS I see 0
reuses (and 4425 dirty buffers), which is what I'd expect.

> But the IO view says that 2345 of the pages were "reused", which seems
> misleading to me. Maybe that just follows from the behavior and the view is
> fine. If the view is fine, maybe this case should still be specifically
> mentioned in the docs.

I think that's just confusing due to the reset. 2048 + 2345 = 4393, but we
only have 2080 buffers for u2 in s_b.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2022-11-25 22:47:57 Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
Previous Message Zheng Li 2022-11-25 22:23:09 Re: Support logical replication of DDLs