Quick Links

Re: varlena beyond 1GB and matrix

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>, PgHacker <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: varlena beyond 1GB and matrix
Date:	2016-12-07 23:36:47
Message-ID:	2185.1481153807@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Wed, Dec 7, 2016 at 8:50 AM, Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp> wrote:
>> I like to propose a new optional type handler 'typserialize' to
>> serialize an in-memory varlena structure (that can have indirect
>> references) to on-disk format.

> I think it's probably a mistake to conflate objects with substructure
> with objects > 1GB. Those are two somewhat orthogonal needs.

Maybe. I think where KaiGai-san is trying to go with this is being
able to turn an ExpandedObject (which could contain very large amounts
of data) directly into a toast pointer or vice versa. There's nothing
really preventing a TOAST OID from having more than 1GB of data
attached, and if you had a side channel like this you could transfer
the data without ever having to form a larger-than-1GB tuple.

The hole in that approach, to my mind, is that there are too many places
that assume that they can serialize an ExpandedObject into part of an
in-memory tuple, which might well never be written to disk, or at least
not written to disk in a table. (It might be intended to go into a sort
or hash join, for instance.) This design can't really work for that case,
and unfortunately I think it will be somewhere between hard and impossible
to remove all the places where that assumption is made.

At a higher level, I don't understand exactly where such giant
ExpandedObjects would come from. (As you point out, there's certainly
no easy way for a client to ship over the data for one.) So this feels
like a very small part of a useful solution, if indeed it's part of a
useful solution at all, which is not obvious.

FWIW, ExpandedObjects themselves are far from a fully fleshed out
concept, one of the main problems being that they don't have very long
lifespans except in the case that they're the value of a plpgsql
variable. I think we would need to move things along quite a bit in
that area before it would get to be useful to think in terms of
ExpandedObjects containing multiple GB of data. Otherwise, the
inevitable flattenings and re-expansions are just going to kill you.

Likewise, the need for clients to be able to transfer data in chunks
gets pressing well before you get to 1GB. So there's a lot here that
really should be worked on before we try to surmount that barrier.

regards, tom lane

In response to

Re: varlena beyond 1GB and matrix at 2016-12-07 23:04:09 from Robert Haas

Responses

Re: varlena beyond 1GB and matrix at 2016-12-08 00:16:21 from Tom Lane
Re: varlena beyond 1GB and matrix at 2016-12-08 04:01:25 from Kohei KaiGai
Re: varlena beyond 1GB and matrix at 2016-12-08 06:53:36 from Craig Ringer

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2016-12-07 23:49:57	Re: Back-patch use of unnamed POSIX semaphores for Linux?
Previous Message	Alvaro Herrera	2016-12-07 23:27:46	Re: Back-patch use of unnamed POSIX semaphores for Linux?