| From: | Daniil Davydov <3danissimo(at)gmail(dot)com> |
|---|---|
| To: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
| Cc: | Sami Imseih <samimseih(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Matheus Alcantara <matheusssilv97(at)gmail(dot)com>, Maxim Orlov <orlovmg(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: POC: Parallel processing of indexes in autovacuum |
| Date: | 2026-02-10 15:03:45 |
| Message-ID: | CAJDiXggY1QzNde6_HhpzneLc9dYqmWZ+PY39cuBXYdcCTuoJBA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
Thanks everyone for the review!
**Comments on the 0001 patch**
On Thu, Jan 22, 2026 at 5:22 AM Sami Imseih <samimseih(at)gmail(dot)com> wrote:
>
>
> + AutoVacuumShmem->av_freeParallelWorkers = Max(nfree_workers, 0);
> +
> + /*
> + * Don't allow number of free workers to become less than zero if the
> + * patameter was decreased.
> + */
> + AutoVacuumShmem->av_freeParallelWorkers =
> + Max(AutoVacuumShmem->av_freeParallelWorkers, 0);
> ```
>
> This can probably be simplified to:
>
> ```
> AutoVacuumShmem->av_freeParallelWorkers = Max(nfree_workers, 0);
> ```
On Thu, Jan 22, 2026 at 5:29 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> + /*
> + * Cap or increase number of free parallel workers according to the
> + * parameter change.
> + */
> + AutoVacuumShmem->av_freeParallelWorkers = Max(nfree_workers, 0);
> +
> + /*
> + * Don't allow number of free workers to become less than zero if the
> + * patameter was decreased.
> + */
> + AutoVacuumShmem->av_freeParallelWorkers =
> + Max(AutoVacuumShmem->av_freeParallelWorkers, 0);
>
> Why does it do Max(x, 0) twice?
Agreed, I missed this one. Surely it can be simplified.
--
On Thu, Jan 22, 2026 at 5:29 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Sat, Jan 17, 2026 at 6:52 AM Daniil Davydov <3danissimo(at)gmail(dot)com> wrote:
> >
> > I will keep the 'max_worker_processes' limit, so autovacuum will not
> > waste time initializing a parallel context if there is no chance that
> > the request will succeed.
> > But it's worth remembering that actually the
> > 'autovacuum_max_parallel_workers' parameter will always be implicitly
> > capped by 'max_parallel_workers'.
>
> It doesn't make sense to me that we limit
> autovacuum_max_parallel_workers by max_worker_processes TBH. When
> users want to have more parallel vacuum workers for autovacuum and the
> VACUUM command, they would have to consider max_worker_processes,
> max_parallel_workers, and max_parallel_maintenance_workers separately.
> Given that max_parallel_workers is controlling the number of
> max_worker_processes that can be used in parallel operations, I
> believe that parallel vacuum workers for autovacuum should also be
> taken from that pool.
Maybe I don't quite understand the meaning of "limited by". For example,
we have a max_parallel_workers_per_gather parameter, which is limited
by max_parallel_workers. But actually we can set this parameter to a value
that is higher than max_parallel_workers. The limitation is that for Gather
node we cannot request more workers than are available in bgworkers pool
(where number of free workers is always <= max_parallel_workers). Thus,
limitation actually exists only for bgworkers pool, on which other parallel
operations depend. In particular, whatever values we set for the
autovacuum_max_parallel_workers parameter, it always will depend only
on bgworkers pool.
I'll give in to your opinion and add a limitation by max_parallel_workers.
But I still don't understand where the point is in explicit limitation by
max_parallel_workers, if we already have this limitation implicitly?
It seems a bit redundant for me. I hope I've conveyed my point correctly.
**Comments on the 0002 patch**
On Thu, Jan 22, 2026 at 5:22 AM Sami Imseih <samimseih(at)gmail(dot)com> wrote:
>
> I don't think showing "reserved" in the logging is needed and could be
> confusing.
>
The rationale for this is in the previous letter of Masahiko-san, and I
agree with it. Do you think it can be confusing because users
aren't familiar with the "reserved workers" in terms of postgres?
I think that we can write about it in documentation, so users will
be ready for it.
--
On Thu, Jan 22, 2026 at 5:22 AM Sami Imseih <samimseih(at)gmail(dot)com> wrote:
>
> I think it's better
> to show separate lines for index vacuuming and index cleanup, just
> like VACUUM VERBOSE.
>
> ```
> INFO: launched 2 parallel vacuum workers for index vacuuming (planned: 2)
> INFO: launched 1 parallel vacuum worker for index cleanup (planned: 1)
> ```
>
Actually, we already have such a logging (see
parallel_vacuum_process_all_indexes function) for both VACUUM
PARALLEL and parallel autovacuum. I think that in addition we can
split the final log message (with total parallel vacuum stats) into two
lines : for vacuum and cleanup respectively. Please, see these changes
in the 0002 patch.
--
On Thu, Jan 22, 2026 at 5:29 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> Can we refactoring these codes to:
>
> if (vacrel->workers_usage.nplanned > 0)вв
> {
> if (AmAutoVacuumWorkerProcess())
> appendStringInfo(...);
> else
> appendStringInfo(...);
I agree.
**Comments on the 0003 patch**
On Thu, Jan 22, 2026 at 5:29 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Sat, Jan 17, 2026 at 6:52 AM Daniil Davydov <3danissimo(at)gmail(dot)com> wrote:
> >
> > If behavior will be changed, then all comments for this function will need to
> > be changed, actually. Don't get me wrong - I just think that this Note is
> > important for the readers. But if you doubt its usefulness, I don't
> > mind deleting it.
>
> I still could not figure out why it should be mentioned here instead
> of at the comment of parallel_vacuum_update_shared_delay_params().
> Readers can notice that calling
> parallel_vacuum_update_shared_delay_params() for parallel vacuum
> worker for the VACUUM command has no effect when reading the function.
> In my opinion, we should mention here why we call
> parallel_vacuum_update_shared_delay_params() but should not mention
> what the called function does because it should have been described in
> that function.
>
OK, I agree.
> BTW can we expose pv_shared_cost_params so that we can check it in
> vacuum_delay_point() before trying to call
> parallel_vacuum_update_shared_delay_params()?
>
I would prefer not to do so. IMO it would be better if we'll encapsulate the
shared delay parameters logic inside a single file.
> + if (!AmAutoVacuumWorkerProcess() && IsParallelWorker())
> + {
>
> We can just check IsParallelWorker() here.
I agree.
--
On Thu, Jan 22, 2026 at 5:29 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> +extern void parallel_vacuum_update_shared_delay_params(void);
> +extern void parallel_vacuum_propagate_cost_based_params(void);
>
> I think it's better to have similar names to these functions for
> consistency and readability. How about the following?
>
> parallel_vacuum_update_delay_params();
> parallel_vacuum_propagate_delay_params();
>
Yep, 100% agree - I just forgot to do it. if you don't mind, I would leave
the word "shared" in the function names.
--
On Thu, Jan 22, 2026 at 5:29 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> + params_generation = pg_atomic_read_u32(&pv_shared_cost_params->generation);
> +
> + SpinLockAcquire(&pv_shared_cost_params->spinlock);
> +
> + pv_shared_cost_params->cost_delay = vacuum_cost_delay;
> + pv_shared_cost_params->cost_limit = vacuum_cost_limit;
> + pv_shared_cost_params->cost_page_dirty = VacuumCostPageDirty;
> + pv_shared_cost_params->cost_page_hit = VacuumCostPageHit;
> + pv_shared_cost_params->cost_page_miss = VacuumCostPageMiss;
>
> I think we can check if the new cost-based delay parameters are really
> changed before changing the shared values. If users don't change
> cost-based delay parameters, we don't need to increment the generation
> at all.
>
I agree.
--
On Thu, Jan 22, 2026 at 5:29 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> +/*
> + * Only autovacuum leader can reload config file. We use this structure in
> + * parallel autovacuum for keeping worker's parameters in sync with leader's
> + * parameters.
> + */
> +typedef struct PVSharedCostParams
>
> I'd suggest writing the overall description first (e.g., what the
> struct holds and what the function does etc), and then describing the
> details and notes etc. For instance, readers might be confused when
> reading the first sentence "Only autovacuum leader can reload config
> file" as the struct definition is not related to the autovacuum
> implementation fact that autovacuum workers can reload the config file
> during the work. We would need to mention such detail somewhere in the
> comments but I think it should not be the first sentence. How about
> rewriting it to something like:
>
> +/*
> + * Struct for cost-based vacuum delay related parameters to share among an
> + * autovacuum worker and its parallel vacuum workers.
> + */
>
Yep, you are right.
> + slock_t spinlock; /* protects all fields below */
>
> It's convention to name 'mutex' as a field name.
>
OK.
--
> +static PVSharedCostParams * pv_shared_cost_params = NULL;
> +
> +/* See comments for structure above for the explanation. */
> +static uint32 shared_params_generation_local = 0;
>
> I think it's preferable to move these definitions of static variables
> right before the function prototypes.
>
I agree.
--
> + /*
> + * If 'true' then we are running parallel autovacuum. Otherwise, we are
> + * running parallel maintenence VACUUM.
> + */
> + bool am_parallel_autovacuum;
>
> How about renaming it to use_shared_delay_params? I think it conveys
> better what the field is used for.
I think that we should leave this name, because in the future some other
behavior differences may occur between manual VACUUM and autovacuum.
If so, we will already have an "am_autovacuum" field which we can use in
the code.
The existing logic with the "am_autovacuum" name is also LGTM - we should
use shared delay params only because we are running parallel autovacuum.
--
On Thu, Jan 22, 2026 at 5:22 AM Sami Imseih <samimseih(at)gmail(dot)com> wrote:
>
> inside vacuum_delay_point, I would re-organize the checks to
> first run the code block for the a/v worker:
>
> ```
> if (ConfigReloadPending && AmAutoVacuumWorkerProcess())
> ```
>
> then the a/v/ parallel worker:
>
> ```
> if (!AmAutoVacuumWorkerProcess() && IsParallelWorker())
> ```
>
Besides ConfigReloadPending we also must check VacuumCostActive.
I placed the call of update_shared_delay_params function *before* checking
VacuumCostActive, because parallel worker can change value of this variable
inside of this function. Also we should call functions related to a/v worker
only *after* checking the VacuumCostActive. Thus, the parallel a/v worker
logic should be called before leader a/v worker logic.
Am I missing something?
--
On Thu, Jan 22, 2026 at 5:22 AM Sami Imseih <samimseih(at)gmail(dot)com> wrote:
>
> But I am also wondering if we should have a specific backend_type
> for "autovacuum parallel worker" to differentiate that from the
> existing "autovacuum worker".
>
> and also we can have a helper macro like:
> ```
> #define AmAutoVacuumParallelWorkerProcess() (MyBackendType ==
> B_AUTOVAC_PARALLEL_WORKER)
> ```
>
> What do you think?
>
I don't think that we should do it, because the workers (that are launched
by a/v worker) are technically no different from other bgworkers, that are
launched for other purposes. Since we easily can distinguish a/v parallel
worker from others, I suggest we leave it as it is.
--
On Thu, Jan 22, 2026 at 5:22 AM Sami Imseih <samimseih(at)gmail(dot)com> wrote:
>
> Add
> ```
> +typedef struct PVSharedCostParams
> ````
>
> to src/tools/pgindent/typedefs.list
>
I agree. I'll also add all new structures to the typedefs.list
--
On Thu, Jan 22, 2026 at 5:22 AM Sami Imseih <samimseih(at)gmail(dot)com> wrote:
>
>
> + pg_atomic_init_u32(&shared->cost_params.generation, 0);
> + SpinLockInit(&shared->cost_params.spinlock);
> + pv_shared_cost_params = &(shared->cost_params);
>
> NIT: move SpinLockInit last
I think that we should init the pointer to the shared->cost_params when
all of this structure's fields are initialized. Thus, I guess that SpinLockInit
should be placed before the "pv_shared_cost_params = ...".
Here it doesn't actually make any difference where to place it, but I think
It's a little more beautiful.
--
On Thu, Jan 22, 2026 at 5:22 AM Sami Imseih <samimseih(at)gmail(dot)com> wrote:
>
>
> Instead of:
>
> ```
> + params_generation =
> pg_atomic_read_u32(&pv_shared_cost_params->generation);
> +
> ```
> and then later on:
> ````
> + /*
> + * Increase generation of the parameters, i.e. let parallel workers know
> + * that they should re-read shared cost params.
> + */
> + pg_atomic_write_u32(&pv_shared_cost_params->generation,
> + params_generation + 1);
> +
> + SpinLockRelease(&pv_shared_cost_params->spinlock);
> ```
>
> why can't we just do:
>
> pg_atomic_fetch_add_u32(&pv_shared_cost_params->generation, 1);
>
On Thu, Jan 22, 2026 at 5:29 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> + pg_atomic_write_u32(&pv_shared_cost_params->generation,
> + params_generation + 1);
>
> We can use pg_atomic_add_fetch_u32() instead.
>
Yep, agreed.
--
On Thu, Jan 22, 2026 at 5:22 AM Sami Imseih <samimseih(at)gmail(dot)com> wrote:
>
> Also, do the pg_atomic_fetch_add_u32 outside of the spinlock. right?
>
Sure. Somehow I missed it.
**Comments on the 0004 patch**
On Thu, Jan 22, 2026 at 5:29 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> The patch introduces 5 injection points, which seems overkill to me
> for implementing the tests. IIUC we can implement the test2 with two
> injection points: 'autovacuum-start-parallel-vacuum' (set right before
> lazy_scan_heap() call) and
> 'autovacuum-leader-before-indexes-processing'.
>
> 1. stop the autovacuum worker at 'autovacuum-start-parallel-vacuum'.
> 2. change delay params and reload the conf.
> 3. let the autovacuum worker process tables (vacuum_delay_point() is
> called during the heap scan).
> 4. stop the autovacuum worker at 'autovacuum-leader-before-indexes-processing'.
> 5. let parallel workers process indexes (vacuum_delay_point() is
> called during index vacuuming).
OK, I'll do it.
--
> For test3, I think we can write a DEBUG2 log in
> adjust_free_parallel_workers() and check it during the test instead of
> introducing the test-only function.
>
> Truncating all logs every after test would decrease the debuggability
> much. We can pass the offset as the start point to wait for the
> contents.
>
I've combined two of your above comments purposely. I agree that truncating
all logs is a bad decision and we need to solve this in a different way. But the
problem will occur If we want to 1) use logging instead of a test-only function
and 2) use offsets as the start point to wait for the contents in the logfile.
Imagine that we (using the described approach) need to wait until the end of
parallel index processing and determine the current number of free parallel
workers.
IIUC, It'll look like this :
wait_for_av_log("autovacuum processing finished");
wait_for_av_log("number of free workers = N");
But when we call wait_for_av_log first time, we will advance "offset" to the
end of logfile and thus we will miss the "number of free workers = N". The
only way to avoid it is to write a function that will determine the exact
position of "autovacuum processing finished" in the logfile. Isn't it too much?
I think that using wait_for_av_log("autovacuum processing finished"); +
SELECT get_parallel_autovacuum_free_workers(); will be much more
demonstrably and simply.
Moreover, the AutoVacuumGetFreeParallelWorkers function doesn't
seem completely useless in isolation from tests. I suggest leaving
this function and its usage in the tests. I can remove the "For testing
purpose only!" comment, so everyone will be free to use this function
in the future.
> For test4 and test5, we check the number of free workers using
> get_parallel_av_free_workers(). However, since autovacuum
> could retry to vacuum the table again, the test could fail.
Yep, good catch.
1)
Test 5 can be stabilized as follows :
We can attach to the "autovacuum-start-parallel-vacuum" injection point in
the "wait" mode. Thereby when we terminate the first a/v leader, we are
guaranteed that no other a/v leader will reach release/reserve functions.
And then we are free to call the get_parallel_autovacuum_free_workers
function. I'll additionally describe this logic in the test.
2)
In the test 4 I found another problem : when a/v leader errors out, it will
exit() pretty soon. And during exit() it will call the before_shmem_exit hook.
Thus, we cannot be sure that parallel workers has been released exactly
in the try/catch block. In order to guarantee it, I think that we should log
something inside the try/catch block. I added a pretty controversial loggin
code for it, but it is the best I came up with.
In the test 4 the above idea will look something like this:
$log_start = $node->wait_for_log(
qr/error triggered for injection point / .
qr/autovacuum-leader-before-indexes-processing/,
$log_start
);
$log_start = $node->wait_for_log(
qr/2 parallel autovacuum workers has been released after occured error/,
$log_start
);
Above I described a problem that may occur when we advance
"logfile offset" too far after the first wait_for_log call. Here, this problem
doesn't occur because the autovacuum launcher infinitely tries to
vacuum the table, so other "N workers released" messages occur.
--
> And here are some general comments and suggestions:
>
> +use warnings FATAL => 'all';
> +use PostgreSQL::Test::Cluster;
> +use PostgreSQL::Test::Utils;
> +use Test::More;
>
> We need comments to explain what we test with this test file.
>
OK, I'll add it. I suppose I can limit myself to a simple
"Test parallel autovacuum behavior", because the specific test scenarios
are described below.
--
> + $node->safe_psql('postgres', qq{
> + UPDATE test_autovac SET col_1 = $test_number;
> + ANALYZE test_autovac;
> + });
>
> Why do we need to execute ANALYZE as well?
I added ANALYZE just in case. But I see that statistics of deleted and
updated tuples is accumulated at the end of the transaction, so I agree
that we can get rid of ANALYZE here.
--
> +# Insert specified tuples num into the table
> +$node->safe_psql('postgres', qq{
> + DO \$\$
> + DECLARE
> + i INTEGER;
> + BEGIN
> + FOR i IN 1..$initial_rows_num LOOP
> + INSERT INTO test_autovac VALUES (i, i + 1, i + 2, i + 3);
> + END LOOP;
> + END \$\$;
> +});
>
> We can use generate_series() here. And it's faster to load the data
> and then create indexes.
OK, I'll fix it.
--
> +$node->psql('postgres',
> + "SELECT get_parallel_autovacuum_free_workers();",
> + stdout => \$psql_out,
> +);
>
> Please use pgsql_safe() instead.
Sure!
--
Again, thanks everyone for the review!
I hope I didn't miss anything.
Please, see updated sets of patches.
This time I'll try something experimental - besides the patches I'll also
post differences between corresponding patches from v20 and v21.
I.e. you can apply v20--v21-diff-for-0001 on the v20-0001 patch and
get the v21-0001 patch. There are a lot of changes, so I guess it will
help you during review. Please, let me know whether it is useful for you.
--
Best regards,
Daniil Davydov
| Attachment | Content-Type | Size |
|---|---|---|
| v21-0005-Documentation-for-parallel-autovacuum.patch | text/x-patch | 4.4 KB |
| v21-0001-Parallel-autovacuum.patch | text/x-patch | 19.4 KB |
| v21-0002-Logging-for-parallel-autovacuum.patch | text/x-patch | 10.2 KB |
| v21-0004-Tests-for-parallel-autovacuum.patch | text/x-patch | 23.3 KB |
| v21-0003-Cost-based-parameters-propagation-for-parallel-a.patch | text/x-patch | 9.8 KB |
| v20--v21-diff-for-0003.patch | text/x-patch | 9.9 KB |
| v20--v21-diff-for-0001.patch | text/x-patch | 3.5 KB |
| v20--v21-diff-for-0002.patch | text/x-patch | 8.3 KB |
| v20--v21-diff-for-0004.patch | text/x-patch | 18.3 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Greg Burd | 2026-02-10 15:09:36 | Re: Expanding HOT updates for expression and partial indexes |
| Previous Message | Tom Lane | 2026-02-10 14:55:23 | Re: Miscellaneous message fixes |