回复:回复:回复:回复:how to create index concurrently on partitioned table

From: 李杰(慎追) <adger(dot)lj(at)alibaba-inc(dot)com>
To: 李杰(慎追) <adger(dot)lj(at)alibaba-inc(dot)com>, "Michael Paquier" <michael(at)paquier(dot)xyz>
Cc: "Justin Pryzby" <pryzby(at)telsasoft(dot)com>, "pgsql-hackers" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, 曾文旌(义从) <wenjing(dot)zwj(at)alibaba-inc(dot)com>, "Alvaro Herrera" <alvherre(at)2ndquadrant(dot)com>
Subject: 回复:回复:回复:回复:how to create index concurrently on partitioned table
Date: 2020-06-18 07:01:30
Message-ID: 7399de28-d865-4ab1-befe-f8fb6b9c07fd.adger.lj@alibaba-inc.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Not sure I am following. In the case of REINDEX, it seems to me that
> the calls to validate_index() and index_concurrently_build() can
> happen in a separate transaction for each index, as long as all the
> calls to index_concurrently_swap() are grouped together in the same
> transaction to make sure that index partition trees are switched
> consistently when all entries are swapped from an invalid state to a
> valid state, because the swapping phase is also when we attach a fresh
> index to a partition tree. See also index_concurrently_create_copy()
> where we don't set parentIndexRelid for the lower call to
> index_create(). It would be good of course to check that when
> swapping we have the code to handle that for a lot of indexes at
> once.

Some errors in the last email were not clearly expressed.

Let's look at this example:
A partition table has five partitions,
parttable: part1, part2, part3, part3 ,part5
We simply abstract the following definitions:
phase 1: index_create(), it is only registered in catalogs.
phase 2: index_concurrently_build(), Build the indexes.
phase 3: validate_index(), insert any missing index entries, mark the index as valid.

(scheme 1)
```
StartTransaction one
parttable phase 1
part 1 phase 1
part 2 phase 1
part 3 phase 1
part 4 phase 1
part 5 phase 1
CommitTransaction

StartTransaction two
parttable phase 2part 1 phase 2
part 2 phase 2
part 3 phase 2 (error occurred )
part 4 phase 2
part 5 phase 2
CommitTransaction

StartTransaction three
parttable phase 3
part 1 phase 3
part 2 phase 3
part 3 phase 3
part 4 phase 4
part 5 phase 5 CommitTransaction
...
```
Now, the following steps cannot continue due to an error in Transaction two .
so, Transaction two roll back, Transaction three haven't started.
All of our indexes are invalid. In this way,
we ensure the strong consistency of indexes in the partition tree.
However, we need to rebuild all indexes when reindex.

(scheme 2)
```
StartTransaction one
parttable phase 1
part 1 phase 1
part 2 phase 1
part 3 phase 1
part 4 phase 1
part 5 phase 1
CommitTransaction

StartTransaction two part 1 phase 2
CommitTransaction
StartTransaction three
part 1 phase 3
CommitTransaction

StartTransaction fourpart 2 phase 2
CommitTransaction
StartTransaction five
part 2 phase 3
CommitTransaction

StartTransaction sixpart 3 phase 2 (error occurred )
CommitTransaction
StartTransaction seven
part 3 phase 3
CommitTransaction

StartTransaction xxpart 4 phase 2
CommitTransaction
StartTransaction xx
part 4 phase 3
CommitTransaction

StartTransaction xxpart 5 phase 2
CommitTransaction
StartTransaction xx
part 5 phase 3
CommitTransaction

StartTransaction xxparttable phase 2
CommitTransaction
StartTransaction xx
parttable phase 3
CommitTransaction
```

Now, the following steps cannot continue due to an error in Transaction six .
so, Transaction six roll back, Transactions behind Transaction six have not started
The indexes of the p1 and p2 partitions are available. Other indexes are invalid.
In reindex, we can ignore the rebuild of p1 and p2.
Especially when there are many partitions,
this can save the rebuild of some partition indexes,
This seems better, although it seems to be inconsistent.

Do you think that scheme is more suitable for CIC?

Thank you very much,
Regards, Adger

------------------------------------------------------------------
发件人:李杰(慎追) <adger(dot)lj(at)alibaba-inc(dot)com>
发送时间:2020年6月18日(星期四) 14:37
收件人:Michael Paquier <michael(at)paquier(dot)xyz>
抄 送:Justin Pryzby <pryzby(at)telsasoft(dot)com>; pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>; 曾文旌(义从) <wenjing(dot)zwj(at)alibaba-inc(dot)com>; Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
主 题:回复:回复:回复:回复:how to create index concurrently on partitioned table

> Not sure I am following. In the case of REINDEX, it seems to me that
> the calls to validate_index() and index_concurrently_build() can
> happen in a separate transaction for each index, as long as all the
> calls to index_concurrently_swap() are grouped together in the same
> transaction to make sure that index partition trees are switched
> consistently when all entries are swapped from an invalid state to a
> valid state, because the swapping phase is also when we attach a fresh
> index to a partition tree. See also index_concurrently_create_copy()
> where we don't set parentIndexRelid for the lower call to
> index_create(). It would be good of course to check that when
> swapping we have the code to handle that for a lot of indexes at
> once.

Let's look at this example:
A partition table has five partitions,
parttable: part1, part2, part3, part3 ,part5
We simply abstract the following definitions:
phase 1: index_create(), it is only registered in catalogs.
phase 2: index_concurrently_build(), Build the indexes.
phase 3: validate_index(), insert any missing index entries, mark the index as valid.

(schema 1)
```
StartTransaction one
parttable phase 1
part 1 phase 1
part 2 phase 1
part 3 phase 1
part 4 phase 1
part 5 phase 1
CommitTransaction

StartTransaction two
parttable phase 2part 1 phase 2
part 2 phase 2
part 3 phase 2 (error occurred )
part 4 phase 2
part 5 phase 2
CommitTransaction

StartTransaction three
parttable phase 3
part 1 phase 3
part 2 phase 3
part 3 phase 3
part 4 phase 4
part 5 phase 5 CommitTransaction
...
```
Now, the following steps cannot continue due to an error in Transaction two .
so, Transaction two roll back, Transaction three haven't started.
All of our indexes are invalid. In this way,
we ensure the strong consistency of indexes in the partition tree.
However, we need to rebuild all indexes when reindex.

(schema 2)
```
StartTransaction one
parttable phase 1
part 1 phase 1
part 2 phase 1
part 3 phase 1
part 4 phase 1
part 5 phase 1
CommitTransaction

StartTransaction two part 1 phase 2
part 1 phase 3
CommitTransaction

StartTransaction three part 2 phase 2
part 2 phase 3
CommitTransaction

StartTransaction fourpart 3 phase 2 (error occurred )
part 3 phase 3
CommitTransaction

StartTransaction five part 4 phase 2
part 4 phase 3

StartTransaction sixpart 5 phase 2
part 5 phase 3
CommitTransaction

StartTransaction sevenparttable phase 2
parttable phase 3
CommitTransaction
```

Now, the following steps cannot continue due to an error in Transaction four .
so, Transaction four roll back, Transactions behind Transaction 3 have not started
The indexes of the p1 and p2 partitions are available. Other indexes are invalid.
In reindex, we can ignore the rebuild of p1 and p2.
This seems better, although it seems to be inconsistent.

Do you think that scheme is more suitable for CIC?

Thank you very much,
Regards, Adger

------------------------------------------------------------------
发件人:Michael Paquier <michael(at)paquier(dot)xyz>
发送时间:2020年6月18日(星期四) 10:41
收件人:李杰(慎追) <adger(dot)lj(at)alibaba-inc(dot)com>
抄 送:Justin Pryzby <pryzby(at)telsasoft(dot)com>; pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>; 曾文旌(义从) <wenjing(dot)zwj(at)alibaba-inc(dot)com>; Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
主 题:Re: 回复:回复:回复:how to create index concurrently on partitioned table

On Wed, Jun 17, 2020 at 10:22:28PM +0800, 李杰(慎追) wrote:
> However, I found a problem. If there are many partitions,
> we may need to handle too many missing index entries when
> validate_index(). Especially for the first partition, the time may
> have been long and many entries are missing. In this case, why
> don't we put the second and third phase together into a transaction
> for each partition?

Not sure I am following. In the case of REINDEX, it seems to me that
the calls to validate_index() and index_concurrently_build() can
happen in a separate transaction for each index, as long as all the
calls to index_concurrently_swap() are grouped together in the same
transaction to make sure that index partition trees are switched
consistently when all entries are swapped from an invalid state to a
valid state, because the swapping phase is also when we attach a fresh
index to a partition tree. See also index_concurrently_create_copy()
where we don't set parentIndexRelid for the lower call to
index_create(). It would be good of course to check that when
swapping we have the code to handle that for a lot of indexes at
once.
--
Michael

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2020-06-18 07:13:35 Re: factorial of negative numbers
Previous Message Amit Kapila 2020-06-18 06:51:17 Re: Resetting spilled txn statistics in pg_stat_replication