Re: Add Index-level REINDEX with multiple jobs

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Maxim Orlov <orlovmg(at)gmail(dot)com>
Cc: Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add Index-level REINDEX with multiple jobs
Date: 2024-02-06 06:21:48
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Dec 29, 2023 at 04:15:35PM +0300, Maxim Orlov wrote:
> Recently, one of our customers came to us with the question: why do reindex
> utility does not support multiple jobs for indices (-i opt)?
> And, of course, it is because we cannot control the concurrent processing
> of multiple indexes on the same relation. This was
> discussed somewhere in [0], I believe. So, customer have to make a shell
> script to do his business and so on.

Yep, that should be the correct thread. As far as I recall, one major
reason was code simplicity because dealing with parallel jobs at table
level is a no-brainer on the client side (see 0003): we know that
relations with physical storage will never interact with each other.

> But. This seems to be not that complicated to split indices by parent
> tables and do reindex in multiple jobs? Or I miss something?
> PFA patch implementing this.

+ appendPQExpBufferStr(&catalog_query,
+ "WITH idx as (\n"
+ " SELECT c.relname, ns.nspname\n"
+ " FROM pg_catalog.pg_class c,\n"
+ " pg_catalog.pg_namespace ns\n"
+ " WHERE c.relnamespace OPERATOR(pg_catalog.=) ns.oid AND\n"
+ " c.oid OPERATOR(pg_catalog.=) ANY(ARRAY['\n");

The problem may be actually trickier than that, no? Could there be
other factors to take into account for their classification, like
their sizes (typically, we'd want to process the biggest one first, I

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2024-02-06 06:38:10 Re: Synchronizing slots from primary to standby
Previous Message Amit Kapila 2024-02-06 06:19:21 Re: Synchronizing slots from primary to standby