rebased background worker reimplementation prototype

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>, pgsql-hackers(at)postgresql(dot)org
Subject: rebased background worker reimplementation prototype
Date: 2019-06-11 03:22:49
Message-ID: 20190611032249.kfi7pgqu2ipmlqca@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I've talked a few times about a bgwriter replacement prototype I'd
written a few years back. That happened somewhere deep in another thread
[1], and thus not easy to fix.

Tomas Vondra asked me for a link, but there was some considerable bitrot
since. Attached is a rebased and slightly improved version. It's also
available at [2][3].

The basic observation is that there's some fairly fundamental issues
with the current bgwriter implementation:

1) The pacing logic is complicated, but doesn't work well
2) If most/all buffers have a usagecount, it cannot do anything, because
it doesn't participate in the clock-sweep
3) Backends have to re-discover the now clean buffers.

The prototype is much simpler - in my opinion of course. It has a
ringbuffer of buffers it thinks are clean (which might be reused
concurrently though). It fills that ringbuffer by performing
clock-sweep, and if necessary cleaning, usagecount=pincount=0
buffers. Backends can then pop buffers from that ringbuffer.

Pacing works by bgwriter trying to keep the ringbuffer full, and
backends emptying the ringbuffer. If the ringbuffer is less than 1/4
full, backends wake up bgwriter using the existing latch mechanism.

The ringbuffer is a pretty simplistic lockless (but just obstruction
free, not lock free) implementation, with a lot of unneccessary
constraints.

I've had to improve the current instrumentation for pgwriter
(i.e. pg_stat_bgwriter) considerably - the details in there imo are not
even remotely good enough to actually understand the system (nor are the
names understandable). That needs to be split into a separate commit,
and the half dozen different implementations of the counters need to be
unified.

Obviously this is very prototype-stage code. But I think it's a good
starting point for going forward.

To enable it, one currently has to set the bgwriter_legacy = false GUC.

Some early benchmarks show that in IO heavy cases there's somewhere
between a very mild regression (close to noise), to a pretty
considerable improvement. To see a benefit one - fairly obviously -
needs a workload that is bigger than shared buffers, because otherwise
checkpointer is going to do all writes (and should, it can sort them
perfectly!).

It's quite possible to saturate what a single bgwriter can write out (as
it is before the replacement). I'm inclined to think the next solution
for that is asynchronous IO, and write-combining, rather than multiple
bgwriters.

Here's an example pg_stat_bgwriter from the middle of a pgbench run
(after resetting it a short while before):

┌─[ RECORD 1 ]───────────────┬───────────────────────────────┐
│ checkpoints_timed │ 1 │
│ checkpoints_req │ 0 │
│ checkpoint_write_time │ 179491 │
│ checkpoint_sync_time │ 266 │
│ buffers_written_checkpoint │ 172414 │
│ buffers_written_bgwriter │ 475802 │
│ buffers_written_backend │ 7140 │
│ buffers_written_ring │ 0 │
│ buffers_fsync_checkpointer │ 137 │
│ buffers_fsync_bgwriter │ 0 │
│ buffers_fsync_backend │ 0 │
│ buffers_bgwriter_clean │ 832616 │
│ buffers_alloc_preclean │ 1306572 │
│ buffers_alloc_free │ 0 │
│ buffers_alloc_sweep │ 4639 │
│ buffers_alloc_ring │ 767 │
│ buffers_ticks_bgwriter │ 4398290 │
│ buffers_ticks_backend │ 17098 │
│ maxwritten_clean │ 17 │
│ stats_reset │ 2019-06-10 20:17:56.087704-07 │
└────────────────────────────┴───────────────────────────────┘

Note that buffers_written_backend (as buffers_backend before) accounts
for file extensions too - which bgwriter can't offload. We should
replace that by a non-write (i.e. fallocate) anyway.

Greetings,

Andres Freund

[1] https://postgr.es/m/20160204155458.jrw3crmyscusdqf6%40alap3.anarazel.de
[2] https://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/bgwriter-rewrite
[3] https://github.com/anarazel/postgres/tree/bgwriter-rewrite

Attachment Content-Type Size
v7-0001-Basic-obstruction-free-single-producer-multiple-c.patch text/x-diff 7.1 KB
v7-0002-Rewrite-background-writer.patch text/x-diff 52.6 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-06-11 04:02:28 Re: Missing generated column in ALTER TABLE ADD COLUMN doc
Previous Message Kyotaro Horiguchi 2019-06-11 03:05:01 Re: pg_upgrade: prep_status doesn't translate messages