vacuum_cost_page_miss default value and modern hardware

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject: vacuum_cost_page_miss default value and modern hardware
Date: 2020-12-27 20:16:50
Message-ID: CAH2-WzmLPFnkWT8xMjmcsm7YS3+_Qi3iRWAb2+_Bc8UhVyHfuA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

vacuum_cost_page_miss has a default value of 10, while
vacuum_cost_page_dirty has a default value of 20. This has been the
case since cost-based delays were introduced by commit f425b605f4e
back in 2004. The obvious implication is that dirtying a page is on
average only twice as expensive as a single shared_buffers miss (that
doesn't dirty the page). While that might have made sense back in
2004, when magnetic disks were the norm, it seems questionable now.

The trend these days is that the total number of dirty pages is the
limiting factor for OLTP workloads. This is a broad trend among all
disk-based RDBMSs with an ARIES style design. It is more or less a
result of long term trends in main memory size scaling and flash
storage. It's rare for OLTP workloads to be truly I/O bound, and yet
the backpressure from page cleaning/checkpointing becomes a
bottleneck. In short, writes are way more expensive than reads -- the
*relative* cost of writes has increased significantly (our use of the
OS FS cache makes this even worse).

I suspect that this trend will become even more important for Postgres
in the coming years, but that's not what I want to talk about right
now. I just want to talk about vacuum_cost_page_miss on this thread.

Simply decreasing vacuum_cost_page_dirty seems like a low risk way of
making the VACUUM costing more useful within autovacuum workers.
Halving vacuum_cost_page_dirty to 5 would be a good start, though I
think that a value as low as 2 would be better. That would make it
only 2x vacuum_cost_page_hit's default (i.e 2x the cost of processing
a page that is in shared_buffers but did not need to be dirtied),
which seems sensible to me when considered in the context in which the
value is actually applied (and not some abstract theoretical context).
There are a few reasons why this seems like a good idea now:

* Throttling/delaying VACUUM is only useful as a way of smoothing the
impact on production queries, which is an important goal, but
currently we don't discriminate against the cost that we really should
keep under control (dirtying new pages within VACUUM) very well.

This is due to the aforementioned trends, the use of a strategy ring
buffer by VACUUM, the fact that indexes are usually vacuumed in
sequential physical order these days, and many other things that were
not a factor in 2004.

* There is a real downside to throttling VACUUM unnecessarily, and the
effects are *non-linear*. On a large table, the oldest xmin cutoff may
become very old by the time we're only (say) half way through the
initial table scan in lazy_scan_heap(). There may be relatively little
work to do because most of the dead tuples won't be before the oldest
xmin cutoff by that time (VACUUM just cannot keep up). Excessive
throttling for simple page misses may actually *increase* the amount
of I/O that VACUUM has to do over time -- we should try to get to the
pages that actually need to be vacuumed quickly, which are probably
already dirty anyway (and thus are probably going to add little to the
cost delay limit in practice). Everything is connected to everything
else.

* vacuum_cost_page_miss is very much not like random_page_cost, and
the similar names confuse the issue -- this is not an optimization
problem. Thinking about VACUUM as unrelated to the workload itself is
obviously wrong. Changing the default is also an opportunity to clear
that up.

Even if I am wrong to suggest that a miss within VACUUM should only be
thought of as 2x as expensive as a hit in some *general* sense, I am
concerned about *specific* consequences. There is no question about
picking the best access path here -- we're still going to have to
access the same blocks in the same way sooner or later. In general I
think that we should move in the direction of more frequent, cheaper
VACUUM operations [1], though we've already done a lot of work in that
direction (e.g. freeze map work).

* Some impact from VACUUM on query performance may in fact be a good thing.

Deferring the cost of vacuuming can only make sense if we'll
eventually be able to catch up because we're experiencing a surge in
demand, which seems kind of optimistic -- it seems more likely that
the GC debt will just grow and grow. Why should the DBA not expect to
experience some kind of impact, which could be viewed as a natural
kind of backpressure? The most important thing is consistent
performance.

* Other recent work such as the vacuum_cleanup_index_scale_factor
patch has increased the relative cost of index vacuuming in some
important cases: we don't have a visibility/freeze map for indexes,
but index vacuuming that doesn't dirty any pages and has a TID kill
list that's concentrated at the end of the heap/table is pretty cheap
(the TID binary search is cache efficient/cheap). This change will
help these workloads by better reflecting the way in which index
vacuuming can be cheap for append-only tables with a small amount of
garbage for recently inserted tuples that also got updated/deleted.

* Lowering vacuum_cost_page_miss's default (as opposed to changing
something else) is a simple and less disruptive way of achieving these
goals.

This approach seems unlikely to break existing VACUUM-related custom
settings from current versions that get reused on upgrade. I expect
little impact on small installations.

[1] https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com
--
Peter Geoghegan

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2020-12-27 20:26:05 Re: doc review for v14
Previous Message Joel Jacobson 2020-12-27 20:02:30 hash_extension(text)