Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode
Date: 2023-04-02 20:11:47
Message-ID: CAAKRu_ZLRuzkM3zKogiZAz2hUony37yLY4aaLb8fPf8fgqs5Og@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Apr 1, 2023 at 1:57 PM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
>
> On Sat, Apr 01, 2023 at 01:29:13PM -0400, Melanie Plageman wrote:
> > Hi,
> >
> > I was just doing some cleanup on the main patch in this set and realized
> > that it was missing a few things. One of which is forbidding the
> > BUFFER_USAGE_LIMIT with VACUUM FULL since VACUUM FULL does not use a
> > BAS_VACUUM strategy.
> >
> > VACUUM FULL technically uses a bulkread buffer access strategy for
> > reading the original relation if its number of blocks is > number of
> > shared buffers / 4 (see initscan()). The new rel writing is done using
> > smgrextend/write directly and doesn't go through shared buffers. I
> > think it is a stretch to try and use the size passed in to VACUUM by
> > BUFFER_USAGE_LIMIT for the bulkread strategy ring.
>
> When you say that it's a stretch, do you mean that it'd be a pain to add
> arguments to handful of functions to pass down the setting ? Or that
> it's unclear if doing so would be the desirable/needed/intended/expected
> behavior ?

More that I don't think it makes sense. VACUUM FULL only uses a buffer
access strategy (BAS_BULKREAD) for reading the original relation in and
not for writing the new one. It has different concerns because its
behavior is totally different from regular vacuum. It is not modifying
the original buffers (AFAIK) and the amount of WAL it is generating is
different. Also, no matter what, the new relation won't be in shared
buffers because of VACUUM FULL using the smgr functions directly. So, I
think that allowing the two options together is confusing for the user
because it seems to imply we can give them some benefit that we cannot.

> I wonder if maybe strategy should be configurable in some more generic
> way, like a GUC. At one point I had a patch to allow INSERT to use
> strategy buffers (not just INSERT SELECT). And that's still pretty
> desirable. Also COPY. I've seen load spikes caused by pg_dumping
> tables which are just below 25% of shared_buffers. Which is exacerbated
> because pg_dump deliberately orders tables by size, so those tables are
> dumped one after another, each causing eviction of ~20% of shared
> buffers. And exacerbated some more because TOAST don't seem to use a
> ring buffer in that case.

Yes, it is probably worth exploring how configurable or dynamic Buffer
Access Strategies should be for other users (e.g. not just VACUUM).
However, since the ring sizes wouldn't be the same for all the different
operations, it is probably easier to start with a single kind of
operation and go from there.

> > I somehow feel like VACUUM (FULL, BUFFER_USAGE_LIMIT 'x') should error
> > out instead of silently not using the buffer usage limit, though.
> >
> > I am looking for others' opinions.
>
> Sorry, no opinion here :)
>
> One thing is that it's fine to take something that previously throw an
> error and change it to not throw an error anymore. But it's undesirable
> to do the opposite. For that reason, there's may be a tendency to add
> errors for cases like this.

So, I have made it error out when you specify BUFFER_USAGE_LIMIT with
VACUUM FULL or VACUUM ONLY_DATABASE_STATS. However, if you specify
buffer_usage_limit -1 with either of these options, it will not error
out. I don't love this, but I noticed that VACUUM (FULL, PARALLEL 0)
does not error out, while VACUUM (FULL, PARALLEL X) where X > 0 does.

If I want to error out when BUFFER_USAGE_LIMIT specified at all but
still do so at the bottom of ExecVacuum() with the rest of the vacuum
option sanity checking, I will probably need to add a flag bit for
VacuumParams->options.

I was wondering why some "sanity checking" of vacuum options is done in
ExecVacuum() and some in vacuum() (it isn't just split by what is
applicable to autovacuum and what isn't).

I noticed that even in cases where we don't use the strategy object we
still made it, which I thought seemed like a bit of a waste and easy to
fix. I've added a commit which does not make the BufferAccessStrategy
object when VACUUM FULL or VACUUM ONLY_DATABASE_STATS are specified. I
noticed that we also don't use the strategy for VACUUM (PROCESS_MAIN
false, PROCESS_TOAST false), but it didn't seem worth handling this very
specific case, so I didn't.

v8 attached has the prohibitions specified above (including for
vacuumdb, as relevant) as well as some cleanup, added test cases, and
updated documentation.

0001 is essentially unmodified (i.e. I didn't do anything with the other
global variable David mentioned).

I still have a few open questions:
- what the initial value of ring_size for autovacuum should be (see the
one remaining TODO in the code)
- should ANALYZE allow specifying BUFFER_USAGE_LIMIT since it uses the guc
value when that is set?
- should INDEX_CLEANUP off cause VACUUM to use shared buffers and
disable use of a strategy (like failsafe vacuum)
- should we add anything to VACUUM VERBOSE output about the number of
reuses of strategy buffers?
- Should we make BufferAccessStrategyData non-opaque so that we don't
have to add a getter for nbuffers. I could have implemented this in
another way, but I don't really see why BufferAccessStrategyData
should be opaque

- Melanie

Attachment Content-Type Size
v8-0001-remove-global-variable-vac_strategy.patch text/x-patch 3.6 KB
v8-0003-use-shared-buffers-when-failsafe-active.patch text/x-patch 2.1 KB
v8-0005-Add-VACUUM-BUFFER_USAGE_LIMIT-option-and-GUC.patch text/x-patch 24.7 KB
v8-0002-Don-t-make-vacuum-strategy-ring-when-unused.patch text/x-patch 1.2 KB
v8-0004-Rename-Buffer-Access-Strategy-ring_size-nbuffers.patch text/x-patch 2.5 KB
v8-0006-Add-buffer-usage-limit-option-to-vacuumdb.patch text/x-patch 5.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-04-02 20:13:38 Re: [EXTERNAL] Re: [PATCH] Report the query string that caused a memory error under Valgrind
Previous Message Andres Freund 2023-04-02 20:10:35 Re: Minimal logical decoding on standbys