Re: Add parallelism and glibc dependent only options to reindexdb

From: Julien Rouhaud <julien(dot)rouhaud(at)free(dot)fr>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Sergei Kornilov <sk(at)zsrv(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add parallelism and glibc dependent only options to reindexdb
Date: 2019-07-25 11:00:34
Message-ID: CAOBaU_Y4zP=kYiXkh2sBgV9=k0XmTb4DwY9hS4H=AnZrDTWXuA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 25, 2019 at 12:03 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Thu, Jul 25, 2019 at 12:12:40PM +0300, Sergei Kornilov wrote:
> > Thank you, v9 code and behavior looks good for me. Builds cleanly,
> > works with older releases. I'll mark Ready to Commiter.
>
> The restriction with --jobs and --system is not documented, and that's
> good to have from the start.

That's documented in the patch:
+ Note that this option is ignored with the <option>--index</option>
+ option to prevent deadlocks when processing multiple indexes from
+ the same relation, and incompatible with the <option>--system</option>
+ option.

The restriction with --jobs and --concurrently is indeed not
specifically documented in reindexdb.sgml, there's only a mention in
reindex.sgml:

<command>REINDEX SYSTEM</command> does not support
<command>CONCURRENTLY</command> since system catalogs cannot be reindexed

The behavior doesn't really change with this patch, though we could
enhance the documentation.

> This actually makes the parallel job
> handling with --index inconsistent as we mask parallelism in this case
> by enforcing the use of one connection. I think that we should
> revisit the interactions with --index and --jobs actually, because,
> assuming that users never read the documentation, they may actually be
> surprised to see that something like --index idx1 .. --index idxN
> --jobs=N does not lead to any improvements at all, until they find out
> the reason why.

The problem is that a user doing something like:

reindexdb -j48 -i some_index -S s1 -S s2 -S s3....

will probably be disappointed to learn that he has to run a specific
command for the index(es) that should be reindexed. Maybe we can
issue a warning that parallelism isn't used when an index list is
processed and user asked for multiple jobs?

> Thinking deeper about it, there is also a point of gathering first all
> the relations if one associates --schemas and --tables in the same
> call of reindexdb and then pass down a list of decomposed relations
> which are processed in parallel. The code as currently presented is
> rather straight-forward, and I don't think that this is worth the
> extra complication, but this was not mentioned until now on this
> thread :)

+1

> For the non-parallel case in reindex_one_database(), I would add an
> Assert on REINDEX_DATABASE and REINDEX_SYSTEM with a comment to
> mention that a NULL list of objects can just be passed down only in
> those two cases when the single-object list with the database name is
> built.

Something like that?

if (!parallel)
{
- if (user_list == NULL)
+ /*
+ * Database-wide and system catalogs processing should omit the list
+ * of objects to process.
+ */
+ if (process_type == REINDEX_DATABASE || process_type == REINDEX_SYSTEM)
{
+ Assert(user_list == NULL);
+
process_list = pg_malloc0(sizeof(SimpleStringList));
simple_string_list_append(process_list, PQdb(conn));
}

There's another assert after the else-parallel that checks that a
list is present, so there's no need to also check it here.

I don't send a new patch since the --index wanted behavior is not clear yet.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2019-07-25 11:17:37 Re: Index Skip Scan
Previous Message Amit Kapila 2019-07-25 10:37:08 Re: POC: Cleaning up orphaned files using undo logs