Re: Add parallelism and glibc dependent only options to reindexdb

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Daniel Verite <daniel(at)manitou-mail(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Kevin Grittner <kgrittn(at)gmail(dot)com>
Subject: Re: Add parallelism and glibc dependent only options to reindexdb
Date: 2019-07-23 07:38:01
Message-ID: 20190723073801.GD2059@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 22, 2019 at 02:40:19PM +0200, Julien Rouhaud wrote:
> Right, so I kept the long option. Also this comment was outdated, as
> the --jobs is now just ignored with a list of indexes, so I fixed that
> too.

+ if (!parallel)
+ {
+ if (user_list == NULL)
+ {
+ /*
+ * Create a dummy list with an empty string, as user requires an
+ * element.
+ */
+ process_list = pg_malloc0(sizeof(SimpleStringList));
+ simple_string_list_append(process_list, "");
+ }
+ }
This part looks a bit hacky and this is needed because we don't have a
list of objects when doing a non-parallel system or database reindex.
The deal is that we just want a list with one element: the database of
the connection. Wouldn't it be more natural to assign the database
name here using PQdb(conn)? Then add an assertion at the top of
run_reindex_command() checking for a non-NULL name?

> I considered this, but it would require to adapt all code that declare
> SimpleStringList stack variable (vacuumdb.c, clusterdb.c,
> createuser.c, pg_dumpall.c and pg_dump.c), so it looked like too much
> trouble to avoid 2 local variables here and 1 in vacuumdb.c. I don't
> have a strong opinion here, so I can go for it if you prefer.

Okay. This is a tad wider than the original patch proposal, and this
adds two lines. So let's discard that for now and keep it simple.

>> +$node->issues_sql_like([qw(reindexdb -j2)],
>> + qr/statement: REINDEX TABLE public.test1/,
>> + 'Global and parallel reindex will issue per-table REINDEX');
>> Would it make sense to have some tests for schemas here?
>>
>> One of my comments in [1] has not been answered. What about
>> the decomposition of a list of schemas into a list of tables when
>> using the parallel mode?
>
> I did that in attached v6, and added suitable regression tests.

The two tests for "reindexdb -j2" can be combined into a single call,
checking for both commands to be generated in the same output. The
second command triggering a reindex on two schemas cannot be used to
check for the generation of both s1.t1 and s2.t2 as the ordering may
not be guaranteed. The commands arrays also looked inconsistent with
the rest. Attached is an updated patch with some format modifications
and the fixes for the tests.
--
Michael

Attachment Content-Type Size
reindex_parallel_v7.patch text/x-diff 18.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2019-07-23 07:44:09 Re: pgbench - add pseudo-random permutation function
Previous Message Fabien COELHO 2019-07-23 07:19:00 Re: make \d pg_toast.foo show its indices ; and, \d toast show its main table ; and \d relkind=I show its partitions