Re: MaxOffsetNumber for Table AMs

From: Andres Freund <andres(at)anarazel(dot)de>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MaxOffsetNumber for Table AMs
Date: 2021-05-05 00:40:36
Message-ID: 20210505004036.7gi2emqtm3gogi5d@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2021-05-04 14:13:36 -0700, Peter Geoghegan wrote:
> On Mon, May 3, 2021 at 10:01 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > For example, the TIDs should always work like unsigned integers -- the
> > > table AM must be willing to work with that restriction.
> >
> > Isn't that more a question of the encoding than the concrete representation?
>
> I don't think so, no. How does B-Tree deduplication work without
> something like that? The fact of the matter is that things are very
> tightly coupled in all kinds of ways.

What does the deduplication actually require from tids? Isn't it just
that you need to be able to compare tids?

> > > You'd then have posting lists tuples in nbtree whose TIDs were all
> > > either 6 bytes or 8 bytes wide, with a mix of each possible (though
> > > not particularly likely) on the same leaf page. Say when you have a
> > > table that exceeds the current MaxBlockNumber restrictions. It would
> > > be relatively straightforward for nbtree deduplication to simply
> > > refuse to mix 6 byte and 8 byte datums together to avoid complexity in
> > > boundary cases. The deduplication pass logic has the flexibility that
> > > this requires already.
> >
> > Which nbtree cases do you think would have an easier time supporting
> > switching between 6 or 8 byte tids than supporting fully variable width
> > tids? Given that IndexTupleData already is variable-width, it's not
> > clear to me why supporting two distinct sizes would be harder than a
> > fully variable size? I assume it's things like BTDedupState->htids?
>
> Stuff like that, yeah. The space utilization stuff inside
> nbtsplitloc.c and nbtdedup.c pretty much rests on the assumption that
> TIDs are fixed width.

Hm. It doesn't seems look like that'd be all that hard to adjust / that
it'd be meaningfully easier to support only one other type of tid width.

> Obviously there are some ways in which that could be revised if there
> was a really good reason to do so -- like an actual concrete reason
> with some clear basis in reality.

The example of indirect indexes has been brought up repeatedly - you
just didn't respond to it?

> You have no obligation to make me happy, but FYI I find arguments like
> "but why wouldn't you just allow arbitrary-width TIDs?" to be deeply
> unconvincing. Do you really expect me to do a huge amount of work and
> risk a lot of new bugs, just to facilitate something that may or may
> not ever happen? Would you do that if you were in my position?

So far nobody has expressed any expectation of you doing specific work
in this thread as far as I can see? I certainly didn't intend to. I
think it's perfectly normal to discuss tradeoffs and disagree about
them?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2021-05-05 01:08:35 Re: WIP: WAL prefetch (another approach)
Previous Message Julien Rouhaud 2021-05-05 00:33:36 Re: Some oversights in query_id calculation