Re: Transactions and indexes

From: Chris Cleveland <ccleveland(at)dieselpoint(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Transactions and indexes
Date: 2021-07-20 02:20:33
Message-ID: CABSN6VfHAxCkmdOju-ZKo8JT-tpOK6DgOxWAKMW3kknBD5h5MQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thank you. Does this mean I can implement the index AM and return TIDs
without having to worry about transactions at all?

Also, as far as I can tell, the only way that TIDs are removed from the
index is in ambulkdelete(). Is this accurate? Does that mean that my index
will be returning TIDs for deleted items and I don't have to worry about
that either?

Don't TIDs get reused? What happens when my index returns an old TID which
is now pointing to a new record?

This is going to make it really hard to implement Top X queries of the type
you get from a search engine. A search engine will normally maintain an
internal buffer (usually a priority queue) of a fixed size, X, and add
tuples to it along with their relevance score. The buffer only remembers
the Top X tuples with the highest score. In this way the search engine can
iterate over millions of entries and retain only the best ones without
having an unbounded buffer. For this to work, though, you need to know how
many tuples to keep in the buffer in advance. If my index can't know, in
advance, which TIDs are invisible or deleted, then it can't keep them out
of the buffer, and this whole scheme fails.

This is not going to work unless the system gives the index a clear picture
of transactions, visibility, and deletes as they happen. Is this
information available?

On Mon, Jul 19, 2021 at 6:58 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:

> On Mon, Jul 19, 2021 at 4:31 PM Chris Cleveland
> <ccleveland(at)dieselpoint(dot)com> wrote:
> > I'm confused on how to handle transactions and visibility.
>
> In Postgres, indexes are not considered to be part of the logical
> database. They're just data structures that point to TIDs in the
> table. To an index, each TID is just another object -- it doesn't
> possess any built-in idea about MVCC.
>
> In practice the indexes may be able to surmise certain things about
> MVCC and versioning, as an optimization -- but that is all speculative
> and relies on cooperation from the table AM side. Also, the
> implementation of unique indexes knows more than zero about versions,
> since that's just necessary. These two cases may or may not be
> considered exceptions to the general rule. I suppose that it's a
> matter of perspective.
>
> > So... how do I handle this? Is there some way for me to implement my own
> storage manager that manages visibility?
>
> This is the responsibility of a table AM, not any one index AM. In
> general we assume that each table AM implements something very much
> like heapam's VACUUM implementation. Index AMs may also have
> opportunistic cleanup of their own, as an optimization (actually this
> is what I was referring to).
>
> Theoretically index AMs and table AMs are orthogonal things. How true
> that will be in a world with more than one mature table AM remains to
> be seen.
>
> --
> Peter Geoghegan
>

--
Chris Cleveland
312-339-2677 mobile

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2021-07-20 02:23:07 Re: row filtering for logical replication
Previous Message Japin Li 2021-07-20 01:59:53 Re: Why ALTER SUBSCRIPTION ... SET (slot_name='none') requires subscription disabled?