Re: Indirect indexes

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Indirect indexes
Date: 2016-10-19 12:52:38
Message-ID: CA+TgmoajGNsjTg7B3KjdpuDEuaOiMXWoh6ewc9jkhuCuv2deFQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 18, 2016 at 2:28 PM, Alvaro Herrera
<alvherre(at)2ndquadrant(dot)com> wrote:
> I propose we introduce the concept of "indirect indexes". I have a toy
> implementation and before I go further with it, I'd like this assembly's
> input on the general direction.
>
> Indirect indexes are similar to regular indexes, except that instead of
> carrying a heap TID as payload, they carry the value of the table's
> primary key. Because this is laid out on top of existing index support
> code, values indexed by the PK can only be six bytes long (the length of
> ItemPointerData); in other words, 281,474,976,710,656 rows are
> supported, which should be sufficient for most use cases.[1]

So, I think that this is a really promising direction, but also that
you should try very hard to try to get out from under this 6-byte PK
limitation. That seems really ugly, and in practice it probably means
your PK is probably going to be limited to int4, which is kind of sad
since it leaves people using int8 or text PKs out in the cold. I
believe Claudio Freire is on to something when he suggests storing the
PK in the index tuple; one could try to skip storing the TID, or
always store it as all-zeroes. Simon objected that putting the PK
into the index tuple would disable HOT, but I don't think that's a
valid objection. The whole point of an indirect index is that it
doesn't disable HOT, and the physical location within the index page
you stick the PK value doesn't have any impact on whether that's safe.

The VACUUM problems seem fairly serious. It's true that these indexes
will be less subject to bloat, because they only need updating when
the PK or the indexed columns change, not when other indexed columns
change. On the other hand, there's nothing to prevent a PK from being
recycled for an unrelated tuple. We can guarantee that a TID won't be
recycled until all index references to the TID are gone, but there's
no such guarantee for a PK. AFAICT, that would mean that an indirect
index would have to be viewed as unreliable: after looking up the PK,
you'd *always* have to recheck that it actually matched the index
qual.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-10-19 13:00:06 Re: Remove vacuum_defer_cleanup_age
Previous Message Tomas Vondra 2016-10-19 12:51:21 Re: PATCH: two slab-like memory allocators