Re: Proposal: Global Index

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: 曾文旌 <wenjing(dot)zwj(at)alibaba-inc(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Amit Langote <amitlangote09(at)gmail(dot)com>, Hamid Akhtar <hamid(dot)akhtar(at)gmail(dot)com>, "heikki(dot)linnakangas" <heikki(dot)linnakangas(at)iki(dot)fi>, "Nasby, Jim" <nasbyj(at)amazon(dot)com>
Subject: Re: Proposal: Global Index
Date: 2021-01-11 18:37:02
Message-ID: CA+TgmoYTAi-BxtCrXiktAwhO7m=9xbwKsVXcbLu0apfb_+exWg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 11, 2021 at 12:46 PM Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > For 1) The DETACH old child table can be finished immediately, global index can be kept valid after DETACH is completed, and the cleanup of garbage data in global index can be deferred to VACUUM.
> > This is similar to the global index optimization done by Oracle12c.
> > For 2) ATTACH new empty child table can also be completed immediately.
> > If this is the case, many of the advantages of partitioned tables will be retained, while the advantages of global indexes will be gained.
>
> Yes, we can keep the index rows for the deleted partition and clean them
> up later, but what is the advantage of partitioning then? Just heap
> deletion quickly? Is that enough of a value?

I actually think the idea of lazily deleting the index entries is
pretty good, but it won't work if the way the global index is
implemented is by adding a tableoid column. Because then, I might
detach a partition and later reattach it and the old index entries are
still there but the table contents might have changed. Worse yet, the
table might be dropped and the table OID reused for a completely
unrelated table with completely unrelated contents, which could then
be attached as a new partition.

One of the big selling points of global indexes is that they allow you
to enforce uniqueness on a column unrelated to the partitioning
column. Another is that you can look up a value by doing a single
index scan on the global index rather than an index scan per
partition. Those things are no less valuable for performing index
deletion lazily.

However, there is a VACUUM amplification effect to worry about here
which Wenjing seems not to be considering. Suppose I have a table
which is not partitioned and it is 1TB in size with an index that is
128GB in size. To vacuum the table, I need to do 1TB + 128GB of I/O.
Now, suppose I now partition the table into 1024 partitions each with
its own local index. Each partition is 1GB in size and the index on
each partition is 128MB in size. To vacuum an individual partition
requires 1GB + 128MB of I/O, so to vacuum all the partitions requires
the same amount of total I/O as before. But, now suppose that I have a
single global index instead of a local index per partition. First, how
big will that index be? It will not be 128GB, but somewhat bigger,
because it needs extra space for every indexed tuple. Let's say 140GB.
Furthermore, it will need to be vacuumed whenever any child is
vacuumed, because it contains some index entries from every child. So
the total I/O to vacuum all partitions is now 1GB * 1024 + 140GB *
1024 = 141TB, which is a heck of a lot worse than the 1.125TB I
required with the unpartitioned table or the locally partitioned
table.

That's not necessarily a death sentence for every use case, but it's
going to be pretty bad for tables that are big and heavily updated.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2021-01-11 19:01:20 Re: Proposal: Global Index
Previous Message Tom Lane 2021-01-11 18:34:32 Re: Proposal: Global Index