Re: MaxOffsetNumber for Table AMs

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MaxOffsetNumber for Table AMs
Date: 2021-05-05 16:42:40
Message-ID: CA+TgmoZLNeGMp_y7ri1CFt_WBQ0rQMm=uxHUGXXnqKF8S029KA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 5, 2021 at 11:50 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> I'm being very vocal here because I'm concerned that we're going about
> generalizing TIDs in the wrong way. To me it feels like there is a
> loss of perspective about what really matters.

Well, which things matter is a question of opinion, not fact.

> No other database system has something like indirect indexes. They
> have clustered indexes, but that's rather different.

I don't think this is true at all. If you have a clustered index -
i.e. the table is physically arranged according to the index ordering
- then your secondary indexes all pretty much have to be what we're
calling indirect indexes. They can hardly point to a physical
identifier if rows are being moved around. I believe InnoDB works this
way, and I think Oracle's index-organized tables do too. I suspect
there are other examples.

> > There might be some slight disagreement about whether it's useful to
> > generalize TIDs from a 48-bit address space to a 64-bit address space
> > without making it fully general. Like Andres, I am unconvinced that's
> > meaningfully easier, and I am convinced that it's meaningfully less
> > good, but other people can disagree and that's fine. I'm perfectly
> > willing to change my opinion if somebody shows up with a patch that
> > demonstrates the value of this approach.
>
> It's going to be hard if not impossible to provide empirical evidence
> for the proposition that 64-bit wide TIDs (alongside 48-bit TIDs) are
> the way to go. Same with any other scheme. We're talking way too much
> about TIDs themselves and way too little about table AM use cases, the
> way the data structures might work in new table AMs, and so on.

I didn't mean that it has to be a test result showing that 64-bit TIDs
outperform 56-bit TIDs or something. I just meant there has to be a
reason to believe it's good, which could be based on a discussion of
use cases or whatever. If we *don't* have a reason to believe it's
good, we shouldn't do it.

My point is that so far I am not seeing a whole lot of value of this
proposed approach. For a 64-bit TID to be valuable to you, one of two
things has to be true: you either don't care about having indexes that
store TIDs on your new table type, or the index types you want to use
can store those 64-bit TIDs. Now, I have not yet heard of anyone
working on a table AM who does not want to be able to support adding
btree indexes. There may be someone that I don't know about, and if
so, fine. But otherwise, we need a way to store them. And that
requires changing the page format for btree indexes. But surely we do
not want to make all TIDs everywhere wider in future btree versions,
so at least two TID widths - 6 bytes and 8 bytes - would have to be
supported. And if we're at all going to do that, I think it's
certainly worth asking whether supporting varlena TIDs would really be
all that much harder. You seem to think it is, and you might be right,
but I'm not ready to give up, because I do not see how we are ever
going to get global indexes or indirect indexes without doing it, and
those would be good features to have.

If we can't ever get them, so be it, but you seem to kind of be saying
that things like global indexes and indirect indexes are hard, and
therefore they don't count as reasons why we might want variable-width
TIDs. But one very large reason why those things are hard is that they
require variable-width TIDs, so AFAICS this boils down to saying that
we don't want the feature because it's hard to implement. But we
should not conflate feasibility with desirability. I am quite sure
that lots of people want global indexes. The number of people who want
indirect indexes is in my estimation much smaller, but it's probably
not zero, or else Alvaro wouldn't have tried his hand at writing a
patch. Whether we can *get* those things is in doubt; whether it will
happen in the near future is very much in doubt. But I at least am not
in doubt about whether people want it, because I hear complaints about
the lack of global indexes on an almost-daily basis. If those
complaints are all from people hoping to fake me out into spending
time on something that is worthless to them, my colleagues are very
good actors.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2021-05-05 17:02:33 Re: COPY table_name (single_column) FROM 'unknown.txt' DELIMITER E'\n'
Previous Message Magnus Hagander 2021-05-05 16:34:36 Re: pg_receivewal makes a bad daemon