Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>
Subject: Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode
Date: 2023-03-18 18:30:38
Message-ID: ZBYDTrD1kyGg+HkS@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Subject: [PATCH v4 3/3] add vacuum[db] option to specify ring_size and guc

> + Specifies the ring buffer size to be used for a given invocation of
> + <command>VACUUM</command> or instance of autovacuum. This size is
> + converted to a number of shared buffers which will be reused as part of

I'd say "specifies the size of shared_buffers to be reused as .."

> + a <literal>Buffer Access Strategy</literal>. <literal>0</literal> will
> + disable use of a <literal>Buffer Access Strategy</literal>.
> + <literal>-1</literal> will set the size to a default of <literal>256
> + kB</literal>. The maximum ring buffer size is <literal>16 GB</literal>.
> + Though you may set <varname>vacuum_buffer_usage_limit</varname> below
> + <literal>128 kB</literal>, it will be clamped to <literal>128
> + kB</literal> at runtime. The default value is <literal>-1</literal>.
> + This parameter can be set at any time.

GUC docs usually also say something like
"If this value is specified without units, it is taken as .."

> + is used to calculate a number of shared buffers which will be reused as

*the* number?

> + <command>VACUUM</command>. The analyze stage and parallel vacuum workers
> + do not use this size.

I think what you mean is that vacuum's heap scan stage uses the
strategy, but the index scan/cleanup phases doesn't?

> + The size in kB of the ring buffer used for vacuuming. This size is used
> + to calculate a number of shared buffers which will be reused as part of

*the* number

> +++ b/doc/src/sgml/ref/vacuumdb.sgml

The docs here duplicate the sql-vacuum docs. It seems better to refer
to the vacuum page for details, like --parallel does.

Unrelated: it would be nice if the client-side options were documented
separately from the server-side options. Especially due to --jobs and
--parallel.

> + if (!parse_int(vac_buffer_size, &result, GUC_UNIT_KB, NULL))
> + {
> + ereport(ERROR,
> + (errcode(ERRCODE_SYNTAX_ERROR),
> + errmsg("buffer_usage_limit for a vacuum must be between -1 and %d. %s is invalid.",
> + MAX_BAS_RING_SIZE_KB, vac_buffer_size),
> + parser_errposition(pstate, opt->location)));
> + }
> +
> + /* check for out-of-bounds */
> + if (result < -1 || result > MAX_BAS_RING_SIZE_KB)
> + {
> + ereport(ERROR,
> + (errcode(ERRCODE_SYNTAX_ERROR),
> + errmsg("buffer_usage_limit for a vacuum must be between -1 and %d",
> + MAX_BAS_RING_SIZE_KB),
> + parser_errposition(pstate, opt->location)));
> + }

I think these checks could be collapsed into a single ereport().

if !parse_int() || (result < -1 || result > MAX_BAS_RINGSIZE_KB):
ereport(ERROR,
errcode(ERRCODE_SYNTAX_ERROR),
errmsg("buffer_usage_limit for a vacuum must be an integer between -1 and %d",
MAX_BAS_RING_SIZE_KB),

There was a recent, similar, and unrelated suggestion here:
https://www.postgresql.org/message-id/20230314.135859.260879647537075548.horikyota.ntt%40gmail.com

> +#vacuum_buffer_usage_limit = -1 # size of vacuum buffer access strategy ring.
> + # -1 to use default,
> + # 0 to disable vacuum buffer access strategy and use shared buffers

I think it's confusing to say "and use shared buffers", since
"strategies" also use shared_buffers. It seems better to remove those 4
words.

> @@ -550,6 +563,13 @@ vacuum_one_database(ConnParams *cparams,
> pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
> "--parallel", "13");
>
> + // TODO: this is a problem: if the user specifies this option with -1 in a
> + // version before 16, it will not produce an error message. it also won't
> + // do anything, but that still doesn't seem right.

Actually, that seems fine to me. If someone installs v16 vacuumdb, they
can run it against old servers and specify the option as -1 without it
failing with an error. I don't know if anyone will find that useful,
but it doesn't seem unreasonable.

I still think adding something to the glossary would be good.

Buffer Access Strategy:
A circular/ring buffer used for reading or writing data pages from/to
the operating system. Ring buffers are used for sequential scans of
large tables, VACUUM, COPY FROM, CREATE TABLE AS SELECT, ALTER TABLE,
and CLUSTER. By using only a limited portion of >shared_buffers<, the
ring buffer avoids avoids evicting large amounts of data whenever a
backend performs bulk I/O operations. Use of a ring buffer also forces
the backend to write out its own dirty pages, rather than leaving them
behind to be cleaned up by other backends.

If there's a larger section added than a glossary entry, the text could
be promoted from src/backend/storage/buffer/README to doc/.

--
Justin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joseph Koshakow 2023-03-18 18:48:33 Re: Infinite Interval
Previous Message Tom Lane 2023-03-18 18:18:04 Re: generate_series for timestamptz and time zone problem