Re: Proposal: Global Index

From: 曾文旌 <wenjing(dot)zwj(at)alibaba-inc(dot)com>
To: Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Amit Langote <amitlangote09(at)gmail(dot)com>, Hamid Akhtar <hamid(dot)akhtar(at)gmail(dot)com>, "heikki(dot)linnakangas" <heikki(dot)linnakangas(at)iki(dot)fi>, "Nasby, Jim" <nasbyj(at)amazon(dot)com>
Subject: Re: Proposal: Global Index
Date: 2021-01-07 09:44:01
Message-ID: 78E4E097-EEFE-4755-AAE4-97B60AD53B5B@alibaba-inc.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I've been following this topic for a long time. It's been a year since the last response.
It was clear that our customers wanted this feature as well, and a large number of them mentioned it.

So, I wish the whole feature to mature as soon as possible.
I summarized the scheme mentioned in the email and completed the POC patch(base on PG_13).

Next, I encountered some difficulties when implementing the DDL of the partition table with global index, and I hope to get some help from the community

Here are some details what has been implemented
1 Definition of global index
Using the INCLUDE keyword to include the tableoid of the partitioned table.

2. Maintenance of global index by partition table DML.
Both INSERT and UPDATE of a partitioned table maintain global index

3. Global index scan
Planner: Processes predicate conditions on the primary partition, generating paths and plans for the global index.
Executer: index scan get indextup, get the tableoid from indextup, and verify the visibility of the data in the partition.

4. Vacuum partition table maintains global index
Each partitioned table VACUUM cleans its own garbage data in the global index.

After the above function point is completed, the global index can be used without partition table DDL.

Demo:
--Use pgbench to create the test partition table
pgbench -i -s 1000 --partitions=6 --partition-method=range

—- create global index on bid, bid is not partition key
CREATE INDEX idx_pgbench_accounts_bid on pgbench_accounts(bid) global;

— check global index status
select * , sum(alivetup) over()as sum_alivetup, sum(deadtup) over() as sum_deadtup from bt_get_global_index_status('idx_pgbench_accounts_bid');
relname | alivetup | deadtup | sum_alivetup | sum_deadtup
--------------------+----------+---------+--------------+-------------
pgbench_accounts_1 | 16666667 | 0 | 100000000 | 0
pgbench_accounts_2 | 16666667 | 0 | 100000000 | 0
pgbench_accounts_3 | 16666667 | 0 | 100000000 | 0
pgbench_accounts_4 | 16666667 | 0 | 100000000 | 0
pgbench_accounts_5 | 16666667 | 0 | 100000000 | 0
pgbench_accounts_6 | 16666665 | 0 | 100000000 | 0
(6 rows)

— run pgbench for for a while
pgbench -M prepared -j 32 -c 32 -T 60 -P1

—- check global index, The index has bloated
postgres=# select * , sum(alivetup) over()as sum_alivetup, sum(deadtup) over() as sum_deadtup from bt_get_global_index_status('idx_pgbench_accounts_bid');
relname | alivetup | deadtup | sum_alivetup | sum_deadtup
--------------------+----------+---------+--------------+-------------
pgbench_accounts_1 | 16717733 | 0 | 100306102 | 0
pgbench_accounts_2 | 16717409 | 0 | 100306102 | 0
pgbench_accounts_3 | 16717540 | 0 | 100306102 | 0
pgbench_accounts_4 | 16717972 | 0 | 100306102 | 0
pgbench_accounts_5 | 16717578 | 0 | 100306102 | 0
pgbench_accounts_6 | 16717870 | 0 | 100306102 | 0
(6 rows)

—- vacuum partition table
vacuum pgbench_accounts;

—- Garbage is collected, global index still looks correct and valid.
postgres=# select * , sum(alivetup) over()as sum_alivetup, sum(deadtup) over() as sum_deadtup from bt_get_global_index_status('idx_pgbench_accounts_bid');
relname | alivetup | deadtup | sum_alivetup | sum_deadtup
--------------------+----------+---------+--------------+-------------
pgbench_accounts_1 | 16666667 | 0 | 100000000 | 0
pgbench_accounts_2 | 16666667 | 0 | 100000000 | 0
pgbench_accounts_3 | 16666667 | 0 | 100000000 | 0
pgbench_accounts_4 | 16666667 | 0 | 100000000 | 0
pgbench_accounts_5 | 16666667 | 0 | 100000000 | 0
pgbench_accounts_6 | 16666665 | 0 | 100000000 | 0
(6 rows)

—-

—- global index scan works well
postgres=# select tableoid ,count(*) from pgbench_accounts where bid = 834 group by tableoid;
tableoid | count
----------+-------
16455 | 33335
16458 | 66665
(2 rows)

postgres=# explain select tableoid ,count(*) from pgbench_accounts where bid = 834 group by tableoid;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=2945.23..2945.24 rows=1 width=12)
Group Key: pgbench_accounts.tableoid
-> Global Index Scan using idx_pgbench_accounts_bid on pgbench_accounts (cost=0.50..10.18 rows=587011 width=4)
Index Cond: (bid = 834)
(4 rows)

The following is how to implement DDL of global index. How to maintain global index of DDL of partitioned table.
This seems to be more difficult than the previous work.

I understand there are four main parts

1 Build global index or reindex, especially in concurrent mode

2 Detach partition
Would it be a good idea to make a flag to global index and let VACUUM handle the index data of the Detach partition?

3 Attach partition
It is easy to Attach a new empty partition, but adding a new one with data is not.
If there is a unique key conflict, do we slowly clean up the garbage or invalidate the entire index?

4 Truncate partition with global index
Do we need to process the heap and index data separately in multiple transactions?
This will lose the ability to roll back for Truncate operation.
Is it worth it?

Looking forward to your feedback.

Thanks!

Wenjing

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2021-01-07 09:53:02 Re: EXPLAIN/EXPLAIN ANALYZE REFRESH MATERIALIZED VIEW
Previous Message k.jamison@fujitsu.com 2021-01-07 09:25:22 RE: [Patch] Optimize dropping of relation buffers using dlist