Re: tableam vs. TOAST

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Prabhat Sahu <prabhat(dot)sahu(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: tableam vs. TOAST
Date: 2019-11-11 14:31:22
Message-ID: CA+TgmoYWOQMo086PNZ0H50xya-YPZ+3BLq=2-39rHpvzWpNZDw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 11, 2019 at 8:51 AM Peter Eisentraut
<peter(dot)eisentraut(at)2ndquadrant(dot)com> wrote:
> Compared to the previous patch (v7) where the API just had a "use this
> AM for TOAST" field and the other extreme of pushing TOAST entirely
> inside the heap AM, this seems like the worst of both worlds, with the
> maximum additional complexity.

There might be a misunderstanding here. These patches would still have
a "use this AM for TOAST" callback, just as the previous set did, but
I didn't include that here, because this is talking about a different
part of the problem. The purpose of that callback is to determine
which AM will be used to create the toast table. The purpose of these
patches is to be able to detoast a value given nothing but the TOAST
pointer extracted from the heap tuple, while removing the present
assumption that the TOAST table is a heap table.

(The current coding is actually seriously inconsistent, because right
now, the code that creates TOAST tables always uses the same AM as the
main heap; but the detoasting code only works with heap tables, which
means that no non-heap AM can use the TOAST system at all. If
necessary, we could commit the patch to allow the TOAST table AM to be
changed first, and then handle allowing the detoasting logic to cope
with a non-heap AM as a separate matter.)

> I don't think we need to nail down this API for eternity, so I'd be
> happy to err on the side of practicality here. However, it seems it's
> not quite clear what for example the requirements and wishes from zheap
> would be. What's the simplest way to move this forward?

The only thing zheap needs - in the current design, anyway - is the
ability to change the chunk size. However, I think that's mostly
because we haven't spent a lot of time thinking about how to do TOAST
better than the heap does TOAST today. I think it is desirable to
allow for more options than that. That's why I like this approach more
than the previous one. The previous approach allowed the chunk size to
be variable, but permitted no other AM-specific variation; this one
allows the AM to detoast in any way that it likes. The downside of
that is that if you really do only want to vary the chunk size, you'll
have to repeat somewhat more code. That's sad, but we're never likely
to have enough AMs for that to be a really serious problem, and if we
do, the AM-specific callbacks for those AMs that just want a different
chunk size could call a common helper function.

> The refactorings you proposed seem reasonable on their own, and I have
> some additional comments on that if we decide to go forward in this
> direction. One thing that's confusing is that the TOAST tables have
> fields chunk_id and chunk_seq, but when an error message talks about
> "chunk %d" or "chunk number %d", they usually mean the "seq" and not the
> "id".

Well, we've got errors like this right now:

unexpected chunk number %d (expected %d) for toast value %u in %s

So at least in this case, and I think in many cases, we're referring
to the chunk_id as "toast value %u" and the chunk_seq as "chunk number
%d". I think that's pretty good terminology. It's unfortunate that the
TOAST table columns are called chunk_id and chunk_seq rather than,
say, value_id and chunk_number, and I guess we could possibly change
that without breaking too many things, but I'm not sure that changing
the error messages would help anybody. We could try to rephrase the
error message to mention the two value in the opposite order, which to
me would be more clear, but I'm not exactly sure how to do that
without writing rather awkward English.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nino Floris 2019-11-11 14:44:54 Re: [PATCH] ltree, lquery, and ltxtquery binary protocol support
Previous Message Nikolay Shaplov 2019-11-11 14:22:32 Re: [PATCH] use separate PartitionedRelOptions structure to store partitioned table options