Re: [HACKERS] REINDEX CONCURRENTLY 2.0

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Andreas Karlsson <andreas(at)proxel(dot)se>
Cc: Sergei Kornilov <sk(at)zsrv(dot)org>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: [HACKERS] REINDEX CONCURRENTLY 2.0
Date: 2019-01-17 05:11:01
Message-ID: 20190117051101.GD2036@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 16, 2019 at 05:56:15PM +0100, Andreas Karlsson wrote:
> On 1/16/19 9:27 AM, Michael Paquier wrote:
>> Does somebody mind if I jump into the ship after so long? I was the
>> original author of the monster after all...
>
> Fine by me. Peter?

Okay, I have begun digging into the patch, and extracted for now two
things which can be refactored first, giving a total of three patches:
- 0001, which creates WaitForOlderSnapshots to snapmgr.c. I think
that this can be useful for external extensions to have a process wait
for snapshots older than a minimum threshold hold by other
transactions.
- 0002, which moves the concurrent index build into its own routine,
index_build_concurrent(). At the same time, index_build() has a
isprimary argument which is not used, so let's remove it. This
simplifies a bit the refactoring as well.
- 0003 is the core patch, realigned with the rest, fixing some typos I
found on the way.

Here are also some notes for things I am planning to look after with a
second pass:
- The concurrent drop (phase 5) part, still shares a lot with DROP
INDEX CONCURRENTLY, and I think that we had better refactor more the
code so as REINDEX CONCURRENTLY shared more with DROP INDEX. One
thing which I think is incorrect is that we do not clear the invalid
flag of the drop index before marking it as dead. This looks like a
missing piece from another concurrent-related bug fix lost over the
rebases this patch had.
- set_dead could be refactored so as it is able to handle in input
multiple indexes, using WaitForLockersMultiple(). This way CREATE
INDEX CONCURRENTLY could also use it.
- There are no regression tests for partitioned tables.
- The NOTICE messages showing up when a table has no indexes should be
removed.
- index_create() does not really need a TupleDesc argument, as long as
the caller is able to provide a list of column names.
- At the end of the day, I think that it would be nice to reach a
state where we have a set of low-level routines like
index_build_concurrent, index_set_dead_concurrent which are used by
both paths of CONCURRENTLY and can be called for each phase within a
given transaction. Those pieces can also be helpful to implement for
example an extension able to do concurrent reindexing out of core.

I think that the refactorings in 0001 and 0002 are committable as-is,
and this shaves some code from the core patch.

Thoughts?
--
Michael

Attachment Content-Type Size
0001-Refactor-code-to-wait-for-older-snapshots-into-its-o.patch text/x-diff 8.2 KB
0002-Refactor-index-concurrent-build-into-its-own-routine.patch text/x-diff 7.1 KB
0003-Core-patch-for-REINDEX-CONCURRENTLY.patch text/x-diff 78.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Gierth 2019-01-17 05:29:59 Re: Ryu floating point output patch
Previous Message Donald Dong 2019-01-17 05:10:33 Re: Ryu floating point output patch