Re: Add parallelism and glibc dependent only options to reindexdb

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Kevin Grittner <kgrittn(at)gmail(dot)com>
Subject: Re: Add parallelism and glibc dependent only options to reindexdb
Date: 2019-07-01 16:14:20
Message-ID: CAOBaU_bg3VheGYkjjvPd6Buw2Uk7yqAAGwXy7BXm6DwytWAdwA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 1, 2019 at 3:51 PM Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
>
> Please don't reuse a file name as generic as "parallel.c" -- it's
> annoying when navigating source. Maybe conn_parallel.c multiconn.c
> connscripts.c admconnection.c ...?

I could use scripts_parallel.[ch] as I've already used it in the #define part?

> If your server crashes or is stopped midway during the reindex, you
> would have to start again from scratch, and it's tedious (if it's
> possible at all) to determine which indexes were missed. I think it
> would be useful to have a two-phase mode: in the initial phase reindexdb
> computes the list of indexes to be reindexed and saves them into a work
> table somewhere. In the second phase, it reads indexes from that table
> and processes them, marking them as done in the work table. If the
> second phase crashes or is stopped, it can be restarted and consults the
> work table. I would keep the work table, as it provides a bit of an
> audit trail. It may be important to be able to run even if unable to
> create such a work table (because of the <ironic>numerous</> users that
> DROP DATABASE postgres).

Or we could create a table locally in each database, that would fix
this problem and probably make the code simpler?

It also raises some additional concerns about data expiration. I
guess that someone could launch the tool by mistake, kill reindexdb,
and run it again 2 months later while a lot of new objects have been
added for instance.

> The "glibc filter" thing (which I take to mean "indexes that depend on
> collations") would apply to the first phase: it just skips adding other
> indexes to the work table. I suppose ICU collations are not affected,
> so the filter would be for glibc collations only?

Indeed, ICU shouldn't need such a filter. xxx_pattern_ops based
indexes are also excluded.

> The --glibc-dependent
> switch seems too ad-hoc. Maybe "--exclude-rule=glibc"? That way we can
> add other rules later. (Not "--exclude=foo" because we'll want to add
> the possibility to ignore specific indexes by name.)

That's a good point, I like the --exclude-rule switch.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2019-07-01 16:28:13 Re: Add parallelism and glibc dependent only options to reindexdb
Previous Message Tom Lane 2019-07-01 16:13:11 Re: Cleanup/remove/update references to OID column