Re: Proposal: Global Index

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Robert Haas <robertmhaas(at)gmail(dot)com>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Amit Langote <amitlangote09(at)gmail(dot)com>, Hamid Akhtar <hamid(dot)akhtar(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, "heikki(dot)linnakangas" <heikki(dot)linnakangas(at)iki(dot)fi>
Subject: Re: Proposal: Global Index
Date: 2019-10-30 17:27:02
Message-ID: 20191030172702.dnkpm7tagsuso43u@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2019-10-30 13:05:57 -0400, Tom Lane wrote:
> Peter Geoghegan <pg(at)bowt(dot)ie> writes:
> > On Wed, Oct 30, 2019 at 9:23 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> Well, the *effects* of the feature seem desirable, but that doesn't
> >> mean that we want an implementation that actually has a shared index.
> >> As soon as you do that, you've thrown away most of the benefits of
> >> having a partitioned data structure in the first place.
>
> > Right, but that's only the case for the global index. Global indexes
> > are useful when used judiciously.
>
> But ... why bother with partitioning then? To me, the main reasons
> why you might want a partitioned table are

Quite commonly there's a lot of *other* indexes, often on a lot wider
data than the primary key, that don't need to be global. And whereas in
a lot of cases the primary key in a partitioned table has pretty good
locality (and thus will be mostly buffered IO), other indexes will often
not have that property (i.e. not have much correlation with table
position).

> * ability to cheaply add and remove partitions, primarily so that
> you can cheaply do things like "delete the oldest month's data".

You can still do that to some degree with a global index. Imagine
e.g. keeping a 'partition id' as a sort-of column in the global
index. That allows you to drop the partition, without having to
immediately rebuild the index, by checking the partition id against the
live partitions during lookup. So sure, your'e wasting space for a bit
in the global index, but it'll also be space that is likely to be fairly
efficiently reclaimed the next time vacuum touches the index. And if
not the global index can be rebuilt concurrently without blocking
writes.

> * ability to scale past our limits on the physical size of one table
> --- both the hard BlockNumber-based limit, and the performance
> constraints of e.g. vacuuming a very large table.

For that to be a problem for a global index the global index (which will
often be something like two int4 or int8 columns) itself would need to
be above the block number based limit - which doesn't seem that close.

WRT vacuuming - based on my observations the table itself isn't a
performance problem for vacuuming all that commonly anymore, it's the
associated index scans. So yea, that's a problem.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-10-30 17:33:00 Re: PL/Python fails on new NetBSD/PPC 8.0 install
Previous Message Pavel Stehule 2019-10-30 17:14:00 Re: [Proposal] Add accumulated statistics