From: | Julien Rouhaud <julien(dot)rouhaud(at)free(dot)fr> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | Sergei Kornilov <sk(at)zsrv(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Add parallelism and glibc dependent only options to reindexdb |
Date: | 2019-07-25 11:00:34 |
Message-ID: | CAOBaU_Y4zP=kYiXkh2sBgV9=k0XmTb4DwY9hS4H=AnZrDTWXuA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Jul 25, 2019 at 12:03 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Thu, Jul 25, 2019 at 12:12:40PM +0300, Sergei Kornilov wrote:
> > Thank you, v9 code and behavior looks good for me. Builds cleanly,
> > works with older releases. I'll mark Ready to Commiter.
>
> The restriction with --jobs and --system is not documented, and that's
> good to have from the start.
That's documented in the patch:
+ Note that this option is ignored with the <option>--index</option>
+ option to prevent deadlocks when processing multiple indexes from
+ the same relation, and incompatible with the <option>--system</option>
+ option.
The restriction with --jobs and --concurrently is indeed not
specifically documented in reindexdb.sgml, there's only a mention in
reindex.sgml:
<command>REINDEX SYSTEM</command> does not support
<command>CONCURRENTLY</command> since system catalogs cannot be reindexed
The behavior doesn't really change with this patch, though we could
enhance the documentation.
> This actually makes the parallel job
> handling with --index inconsistent as we mask parallelism in this case
> by enforcing the use of one connection. I think that we should
> revisit the interactions with --index and --jobs actually, because,
> assuming that users never read the documentation, they may actually be
> surprised to see that something like --index idx1 .. --index idxN
> --jobs=N does not lead to any improvements at all, until they find out
> the reason why.
The problem is that a user doing something like:
reindexdb -j48 -i some_index -S s1 -S s2 -S s3....
will probably be disappointed to learn that he has to run a specific
command for the index(es) that should be reindexed. Maybe we can
issue a warning that parallelism isn't used when an index list is
processed and user asked for multiple jobs?
> Thinking deeper about it, there is also a point of gathering first all
> the relations if one associates --schemas and --tables in the same
> call of reindexdb and then pass down a list of decomposed relations
> which are processed in parallel. The code as currently presented is
> rather straight-forward, and I don't think that this is worth the
> extra complication, but this was not mentioned until now on this
> thread :)
+1
> For the non-parallel case in reindex_one_database(), I would add an
> Assert on REINDEX_DATABASE and REINDEX_SYSTEM with a comment to
> mention that a NULL list of objects can just be passed down only in
> those two cases when the single-object list with the database name is
> built.
Something like that?
if (!parallel)
{
- if (user_list == NULL)
+ /*
+ * Database-wide and system catalogs processing should omit the list
+ * of objects to process.
+ */
+ if (process_type == REINDEX_DATABASE || process_type == REINDEX_SYSTEM)
{
+ Assert(user_list == NULL);
+
process_list = pg_malloc0(sizeof(SimpleStringList));
simple_string_list_append(process_list, PQdb(conn));
}
There's another assert after the else-parallel that checks that a
list is present, so there's no need to also check it here.
I don't send a new patch since the --index wanted behavior is not clear yet.
From | Date | Subject | |
---|---|---|---|
Next Message | Kyotaro Horiguchi | 2019-07-25 11:17:37 | Re: Index Skip Scan |
Previous Message | Amit Kapila | 2019-07-25 10:37:08 | Re: POC: Cleaning up orphaned files using undo logs |