Re: PATCH: Add REINDEX tag to event triggers

From: jian he <jian(dot)universality(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>, Garrett Thornburg <film42(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: PATCH: Add REINDEX tag to event triggers
Date: 2023-11-24 00:00:00
Message-ID: CACJufxH+0KHWT19qfyP1VM3h8+CGH7t4LxaLrSA7Lhecs6ia1Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 20, 2023 at 3:34 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Sat, Oct 28, 2023 at 12:15:22PM +0800, jian he wrote:
> > reindex_event_trigger_collect_relation called in
> > ReindexMultipleInternal, ReindexTable (line 2979).
> > Both are "under concurrent is false" branches.
> >
> > So it should be fine.
>
> Sorry for the delay in replying.
>
> Indeed, you're right. reindex_event_trigger_collect_relation() gets
> only called in two paths for the non-concurrent cases just after
> reindex_relation(), which is not a safe location to call it because
> this may be used in CLUSTER or TRUNCATE, and the intention of the
> patch is only to count for the objects in REINDEX.
>
> Anyway, I think that the current implementation is incorrect. The
> object is collected after the reindex is done, which is right.
> However, an object may be reindexed but not be reported if it was
> dropped between the reindex and its endwhen collecting all the objects
> of a single relation. Perhaps it does not matter because the object
> is gone, but that still looks incorrect to me because we finish by not
> reporting everything we should. I think that we should put the
> collection deeper into the stack, where we know that we are holding
> the locks on the objects we are collecting. Another side effect is
> that the patch seems to be missing toast table indexes, which are also
> reindexed for a REINDEX TABLE, for instance. That's not right.
>
> Actually, could there be an argument for pushing the collection down
> to reindex_relation() instead to count for all the commands that?
> Even better, reindex_index() would be a better candidate because it is
> itself called within reindex_relation() for each individual indexes?
> If a code path should not be covered for event triggers, we could use
> a new REINDEXOPT_ to control that, optionally. In short, it looks
> like we should try to reduce the number of paths calling
> reindex_event_trigger_collect(), while collect_relation() ought to be
> removed.
>
> Should REINDEX TABLE CONCURRENTLY's coverage be extended so as two new
> indexes are reported instead of just one?
>
> REINDEX SCHEMA is a command that perhaps should be tested to collect
> all the indexes rebuilt through it?
>
> A side-effect of this patch is that it impacts ddl_command_start and
> ddl_command_end with the addition of REINDEX. Mixing that with the
> addition of a new event is strange. I think that the addition of
> REINDEX to the existing events should be done first, as a separate
> patch. Then a second patch should deal with the collection and the
> support of the new event.
>
> I have also dug into the archives to note that control commands have
> been proposed in the past to be added as part of event triggers, and
> it happens that I've mentioned in a comment that that this was a
> concept perhaps contrary to what event triggers should do, as these
> are intended for DDLs:
> https://www.postgresql.org/message-id/CAB7nPqTqZ2-YcNzOQ5KVBUJYHQ4kDSd4Q55Mc-fBzM8GH0bV2Q@mail.gmail.com
>
> Philosophically, I'm open on all that and having more commands
> depending on the cases that are satisfied, FWIW, but let's organize
> the changes.
> --
> Michael

Hi.
Top level func is ExecReIndex. it will call {ReindexIndex,
ReindexTable, ReindexMultipleTables}
then trace deep down, it will call {ReindexRelationConcurrently, reindex_index}.

Imitate the CREATE INDEX logic, we need pass `const ReindexStmt *stmt`
to all the following functions:
ReindexIndex
ReindexTable
ReindexMultipleTables
ReindexPartitions
ReindexMultipleInternal
ReindexRelationConcurrently
reindex_relation
reindex_index

because the EventTriggerCollectSimpleCommand needs the `(ReindexStmt
*) parsetree`.
and we call EventTriggerCollectSimpleCommand at reindex_index or
ReindexRelationConcurrently.

The new test will cover the following case.
reindex (concurrently) schema;
reindex (concurrently) partitioned index;
reindex (concurrently) partitioned table;

not sure this is the right approach. But now the reindex event
collection is after we call index_open.
same static function (reindex_event_trigger_collect) in
src/backend/commands/indexcmds.c and src/backend/catalog/index.c for
now.
given that ProcessUtilitySlow supports the event triggers facility.
so probably no need for another REINDEXOPT_ option?

I also added a test for disabled event triggers.

Attachment Content-Type Size
v4-0001-Add-REINDEX-tag-to-event-triggers.patch text/x-patch 27.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Quan Zongliang 2023-11-24 01:11:59 Re: PL/pgSQL: Incomplete item Allow handling of %TYPE arrays, e.g. tab.col%TYPE[]
Previous Message Matthias van de Meent 2023-11-23 23:07:12 Re: Table AM Interface Enhancements