Re: MaxOffsetNumber for Table AMs

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MaxOffsetNumber for Table AMs
Date: 2021-05-03 14:41:07
Message-ID: CA+TgmoZPiH3b3HtSiOvDn8S4tSknmyp8F6oWqbSzFJSv2zXGqA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 30, 2021 at 5:22 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> I strongly suspect that index-organized tables (or indirect indexes,
> or anything else that assumes that TID-like identifiers map directly
> to logical rows as opposed to physical versions) are going to break
> too many assumptions to ever be tractable. Assuming I have that right,
> it would advance the discussion if we could all agree on that being a
> non-goal for the tableam interface in general.

I *emphatically* disagree with the idea of ruling such things out
categorically. This is just as naive as the TODO's statement that we
do not want "All backends running as threads in a single process".
Does anyone really believe that we don't want that any more? I
believed it 10 years ago, but not any more. It's costing us very
substantially not only in that in makes parallel query more
complicated and fragile, but more importantly in that we can't scale
up to connection counts that other databases can handle because we use
up too many operating system resources. Support threading in
PostgreSQL isn't a project that someone will pull off over a long
weekend and it's not something that has to be done tomorrow, but it's
pretty clearly the future.

So here. The complexity of getting a table AM that does anything
non-trivial working is formidable, and I don't expect it to happen
right away. Picking one that is essentially block-based and can use
48-bit TIDs is very likely the right initial target because that's the
closest we have now, and there's no sense attacking the hardest
variant of the problem first. However, as with the
threads-vs-processes example, I strongly suspect that having only one
table AM is leaving vast amounts of performance on the table. To say
that we're never going to pursue the parts of that space that require
a different kind of tuple identifier is to permanently write off tons
of ideas that have produced promising results in other systems. Let's
not do that.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-05-03 14:47:47 Re: strange error reporting
Previous Message Matthias van de Meent 2021-05-03 14:39:02 Re: Lowering the ever-growing heap->pd_lower