Re: MaxOffsetNumber for Table AMs

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MaxOffsetNumber for Table AMs
Date: 2021-05-04 20:51:22
Message-ID: CAH2-Wzm4-gRDOtjE51GnONssOe_85f4h2RY3wvyf_jwRcp_NEw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 4, 2021 at 11:52 AM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> On Mon, 2021-05-03 at 15:07 -0700, Peter Geoghegan wrote:
> > It seems senseless to *require* table AMs to support something like a
> > bitmap scan.
>
> I thought about this some more, and this framing is backwards.
> ItemPointers are fundamental to the table AM API: they are passed in to
> required methods, and expected to be returned[1].

I prefer my framing, but okay, let's go with yours. What difference
does it make?

The fact that we're starting with the table AM API doesn't change the
fundamental fact that quite a few implementation details that are
local to code like the GIN AM and tidbitmap.c were (rightly or
wrongly) simply built with heapam in mind. The fact that that's true
is hardly surprising, and hardly argues against the idea of having a
table AM to begin with. There is no getting around the need to talk
about the first principles here, and to talk about the specific
implications for your particular table AM (perhaps others too).

Abstractions are only useful when they serve concrete implementations.
Of course they should be as general and abstract as possible -- but no
more.

> Bitmap scans are optional, but that should be determined by whether the
> author wants to implement the bitmap scan methods of their table AM.
> The fine details of ItemPointer representation should not be making the
> decision for them.

A distinction without a difference. If bitmap scans are optional and
some index AMs are 100% built from the ground up to work only with
bitmap scans, then those index AMs are effectively optional (or
optional to the extent that bitmap scans themselves are optional). I
have absolutely no idea how it would be possible to make GIN work
without having index scans. It would be so different that it wouldn't
be GIN anymore.

I think maybe it is possible for GIN to work with your column store
table AM in particular. Why aren't we talking about that concrete
issue, or something like that? We're talking about this abstraction as
if it must already be perfect, and therefore the standard by which
every other thing needs to be measured. But why?

> We still need to answer the core question that started this thread:
> what the heck is an ItemPointer, anyway?
>
> After looking at itemptr.h, off.h, ginpostinglist.c and tidbitmap.c, it
> seems that an ItemPointer is a block number from [0, 0xFFFFFFFe]; and
> an offset number from [1, MaxHeapTuplesPerPage] which is by default [1,
> 291].
>
> Attached is a patch that clarifies what I've found so far and gives
> clear guidance to table AM authors. Before I commit this I'll make sure
> that following the guidance actually works for the columnar AM.

I don't get what the point of this patch is. Obviously all of the
particulars here are just accidents of history that we ought to change
sooner or later anyway. I don't have any objection to writing them all
down someplace official. But what difference does it make if there is
no underlying *general* set of principles behind any of it? This
definition of a TID can break at any time because it just isn't useful
or general. This is self-evident -- your definition includes
MaxHeapTuplesPerPage! How could that possibly be anything other than
an accident whose details are completely arbitrary and therefore
subject to change at any time?

This is not necessarily a big deal! We can fix it by reconciling
things in a pragmatic, bottom-up way. That's what I expected would
happen all along. The table AM is not the Ark of the Covenant (just
like tidbitmap.c, or anything else).

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2021-05-04 20:53:40 Re: PG in container w/ pid namespace is init, process exits cause restart
Previous Message Tom Lane 2021-05-04 20:35:39 Re: PG in container w/ pid namespace is init, process exits cause restart