Re: MaxOffsetNumber for Table AMs

From: Andres Freund <andres(at)anarazel(dot)de>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MaxOffsetNumber for Table AMs
Date: 2021-05-04 05:01:42
Message-ID: 20210504050142.bhpoff7rsdpacnrq@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2021-04-30 11:51:07 -0700, Peter Geoghegan wrote:
> I think that it's reasonable to impose some cost on index AMs here,
> but that needs to be bounded sensibly and unambiguously. For example,
> it would probably be okay if you had either 6 byte or 8 byte TIDs, but
> no other variations. You could require index AMs (the subset of index
> AMs that are ever able to store 8 byte TIDs) to directly encode which
> width they're dealing with at the level of each IndexTuple. That would
> create some problems for nbtree deduplication, especially in boundary
> cases, but ISTM that you can manage the complexity by sensibly
> restricting how the TIDs work across the board.

> For example, the TIDs should always work like unsigned integers -- the
> table AM must be willing to work with that restriction.

Isn't that more a question of the encoding than the concrete representation?

> You'd then have posting lists tuples in nbtree whose TIDs were all
> either 6 bytes or 8 bytes wide, with a mix of each possible (though
> not particularly likely) on the same leaf page. Say when you have a
> table that exceeds the current MaxBlockNumber restrictions. It would
> be relatively straightforward for nbtree deduplication to simply
> refuse to mix 6 byte and 8 byte datums together to avoid complexity in
> boundary cases. The deduplication pass logic has the flexibility that
> this requires already.

Which nbtree cases do you think would have an easier time supporting
switching between 6 or 8 byte tids than supporting fully variable width
tids? Given that IndexTupleData already is variable-width, it's not
clear to me why supporting two distinct sizes would be harder than a
fully variable size? I assume it's things like BTDedupState->htids?

> > What's wrong with varlena headers? It would end up being a 1-byte
> > header in practically every case, and no variable-width representation
> > can do without a length word of some sort. I'm not saying varlena is
> > as efficient as some new design could hypothetically be, but it
> > doesn't seem like it'd be a big enough problem to stress about. If you
> > used a variable-width representation for integers, you might actually
> > save bytes in a lot of cases. An awful lot of the TIDs people store in
> > practice probably contain several zero bytes, and if we make them
> > wider, that's going to be even more true.
>
> Maybe all of this is true, and maybe it works out to be the best path
> forward in the long term, all things considered. But whether or not
> that's true is crucially dependent on what real practical table AMs
> (of which there will only ever be a tiny number) actually need to do.
> Why should we assume that the table AM cannot accept some
> restrictions? What good does it do to legalistically define the
> problem as a problem for index AMs to solve?

I don't think anybody is arguing that AMs cannot accept any restrictions? I do
think it's pretty clear that it's not entirely obvious what the concrete set
of proper restrictions would be, where we won't end up needing to re-evaluate
limits in a few years are.

If you add to that the fact that variable-width tids will often end up
considerably smaller than our current tids, it's not obvious why we should use
bitspace somewhere to indicate an 8 byte tid instead of a a variable-width
tid?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2021-05-04 05:08:11 Re: AlterSubscription_refresh "wrconn" wrong variable?
Previous Message Andres Freund 2021-05-04 04:31:49 Re: AlterSubscription_refresh "wrconn" wrong variable?