From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Date: 2012-10-03 01:14:17
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi all,

One of the outputs on the discussions about the integration of pg_reorg in
was that Postgres should provide some ways to do REINDEX, CLUSTER and ALTER
TABLE concurrently with low-level locks in a way similar to CREATE INDEX

The discussions done can be found on this thread:

Well, I spent some spare time working on the implementation of REINDEX
This basically allows to perform read and write operations on a table whose
index(es) are
reindexed at the same time. Pretty useful for a production environment. The
caveats of this
feature is that it is slower than normal reindex, and impacts other
backends with the extra CPU,
memory and IO it uses to process. The implementation is based on something
on the same ideas
as pg_reorg and on an idea of Andres.
Please find attached a version that I consider as a base for the next
discussions, perhaps
a version that could be submitted to the commit fest next month. Patch is
aligned with postgres
master at commit 09ac603.

With this feature, you can rebuild a table or an index with such commands:

The following restrictions are applied.
- REINDEX [ DATABASE | SYSTEM ] cannot be run concurrently.
- REINDEX CONCURRENTLY cannot run inside a transaction block.
- Shared tables cannot be reindexed concurrently
- indexes for exclusion constraints cannot be reindexed concurrently.
- toast relations are reindexed non-concurrently when table reindex is done
and that this table has toast relations

Here is a description of what happens when reorganizing an index
(the beginning of the process is similar to CREATE INDEX CONCURRENTLY):
1) creation of a new index based on the same columns and restrictions as the
index that is rebuilt (called here old index). This new index has as name
$OLDINDEX_cct. So only a suffix _cct is added. It is marked as invalid and
not ready.
2) Take session locks on old and new index(es), and the parent table to
unfortunate drops.
3) Commit and start a new transaction
4) Wait until no running transactions could have the table open with the
old list of indexes.
5) Build the new indexes. All the new indexes are marked as indisready.
6) Commit and start a new transaction
7) Wait until no running transactions could have the table open with the
old list of indexes.
8) Take a reference snapshot and validate the new indexes
9) Wait for the old snapshots based on the reference snapshot
10) mark the new indexes as indisvalid
11) Commit and start a new transaction. At this point the old and new
indexes are both valid
12) Take a new reference snapshot and wait for the old snapshots to insure
that old
indexes are not corrupted,
13) Mark the old indexes as invalid
14) Swap new and old indexes, consisting here in switching their names.
15) Old indexes are marked as invalid.
16) Commit and start a new transaction
17) Wait for transactions that might use the old indexes
18) Old indexes are marked as not ready
19) Commit and start a new transaction
20) Drop the old indexes

The following process might be reducible, but I would like that to be
decided depending on
the community feedback and experience on such concurrent features.
For the time being I took an approach that looks slower, but secured to my
mind with multiple
waits (perhaps sometimes unnecessary?) and subtransactions.

If during the process an error occurs, the table will finish with either
the old or new index
as invalid. In this case the user will be in charge to drop the invalid
index himself.
The concurrent index can be easily identified with its suffix *_cct.

This patch has required some refactorisation effort as I noticed that the
code of index
for concurrent operations was not very generic. In order to do that, I
created some
new functions in index.c called index_concurrent_* which are used by CREATE
and REINDEX in my patch. Some refactoring has also been done regarding the
wait processes.
REINDEX TABLE and REINDEX INDEX follow the same code path
in indexcmds.c). The patch structure is relying a maximum on the functions
of index.c
when creating, building and validating concurrent index.

Based on the comments of this thread, I would like to submit the patch at
the next
commit fest. Just let me know if the approach taken by the current
is OK ot if it needs some modifications. That would be really helpful.

The patch includes some regression tests for error checks and also some
Regressions are passing, code has no whitespaces and no compilation
I have also done tests checking for read and write operations on index scan
of parent table
at each step of the process (by using gdb to stop the reindex process at
precise places).

Thanks, and looking forward to your feedback,
Michael Paquier

Attachment Content-Type Size
20121003_reindex_concurrent.patch application/octet-stream 52.7 KB


Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2012-10-03 01:18:52 Re: pg_upgrade does not completely honor --new-port
Previous Message Stirling Newberry 2012-10-02 22:58:12 Re: CREATE SCHEMA IF NOT EXISTS