Quick Links

Re: Add parallelism and glibc dependent only options to reindexdb

From:	Julien Rouhaud <rjuju123(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Kevin Grittner <kgrittn(at)gmail(dot)com>
Subject:	Re: Add parallelism and glibc dependent only options to reindexdb
Date:	2019-07-01 16:14:20
Message-ID:	CAOBaU_bg3VheGYkjjvPd6Buw2Uk7yqAAGwXy7BXm6DwytWAdwA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, Jul 1, 2019 at 3:51 PM Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
>
> Please don't reuse a file name as generic as "parallel.c" -- it's
> annoying when navigating source. Maybe conn_parallel.c multiconn.c
> connscripts.c admconnection.c ...?

I could use scripts_parallel.[ch] as I've already used it in the #define part?

> If your server crashes or is stopped midway during the reindex, you
> would have to start again from scratch, and it's tedious (if it's
> possible at all) to determine which indexes were missed. I think it
> would be useful to have a two-phase mode: in the initial phase reindexdb
> computes the list of indexes to be reindexed and saves them into a work
> table somewhere. In the second phase, it reads indexes from that table
> and processes them, marking them as done in the work table. If the
> second phase crashes or is stopped, it can be restarted and consults the
> work table. I would keep the work table, as it provides a bit of an
> audit trail. It may be important to be able to run even if unable to
> create such a work table (because of the <ironic>numerous</> users that
> DROP DATABASE postgres).

Or we could create a table locally in each database, that would fix
this problem and probably make the code simpler?

It also raises some additional concerns about data expiration. I
guess that someone could launch the tool by mistake, kill reindexdb,
and run it again 2 months later while a lot of new objects have been
added for instance.

> The "glibc filter" thing (which I take to mean "indexes that depend on
> collations") would apply to the first phase: it just skips adding other
> indexes to the work table. I suppose ICU collations are not affected,
> so the filter would be for glibc collations only?

Indeed, ICU shouldn't need such a filter. xxx_pattern_ops based
indexes are also excluded.

> The --glibc-dependent
> switch seems too ad-hoc. Maybe "--exclude-rule=glibc"? That way we can
> add other rules later. (Not "--exclude=foo" because we'll want to add
> the possibility to ignore specific indexes by name.)

That's a good point, I like the --exclude-rule switch.

In response to

Re: Add parallelism and glibc dependent only options to reindexdb at 2019-07-01 13:51:12 from Alvaro Herrera

Responses

Re: Add parallelism and glibc dependent only options to reindexdb at 2019-07-02 02:55:07 from Michael Paquier

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Julien Rouhaud	2019-07-01 16:28:13	Re: Add parallelism and glibc dependent only options to reindexdb
Previous Message	Tom Lane	2019-07-01 16:13:11	Re: Cleanup/remove/update references to OID column