Better support for whole-row operations and composite types

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Better support for whole-row operations and composite types
Date: 2004-03-29 19:18:32
Message-ID: 3072.1080587912@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

We have a number of issues revolving around the fact that composite types
(row types) aren't first-class objects. I think it's past time to fix
that. Here are some notes about doing it. I am not sure all these ideas
are fully-baked ... comments appreciated.

When represented as a Datum, the format of a row-type object needs to be
something like this:

* overall length: int4 (this makes the Datum a valid varlena item)
* row type id: Oid (either a composite type id or RECORDOID)
* row type typmod: int4 (see below for usage)
-- pad if needed to MAXALIGN boundary
* heap tuple representation, beginning with a HeapTupleHeaderData struct

If we do it exactly as above then we will be wasting some space, because
the xmin/xmax/cmax and ctid fields of HeapTupleHeaderData are of no use
in a row that isn't actually a table member row. It is very tempting to
overlay the length and rowtype fields with the HeapTupleHeaderData struct.
This would save some code as well as space --- see discussion below.

Only named composite types, not RECORD, will be allowed to be used as
table column types. This ensures that any row object stored on disk will
have a valid composite type ID embedded in it, so that the row structure
can be retrieved when the row is read. However, we want to be able to
support row objects in memory that are of transient record types (for
example, the output of a function returning RECORD will have a record type
determined by the query itself). I propose that we handle this case by
setting the type id to RECORDOID and using the typmod to identify the
particular record type --- the typmod will essentially be an index into
a backend-local cache of record types. More detail below.

We'll add "tdtypeid" and "tdtypmod" fields to TupleDesc structs. This
will make it easy to set the embedded type information correctly when
manufacturing a row datum using a TupleDesc. For TupleDescs associated
with relations, tdtypeid is just the relation's row type OID, and tdtypmod
is -1. For TupleDescs representing transient row types, we initially set
tdtypeid to RECORDOID and tdtypmod to -1 (indicating a completely
anonymous row type). If the row type actually needs to be identifiable
then we establish a cache entry for it and set the typmod to an index for
the cache entry. I think this will only need to happen when the query
contains a function-returning-RECORD or a whole-row variable referencing
what would otherwise be an anonymous row type, such as a JOIN result.

Composite types, as well as the RECORD type, will be marked in pg_type as
pass-by-ref, varlena (typlen -1), typalign 'd'. (We will use the maximum
alignment always to avoid any dependency on types of the contained
columns.)

The present function call and return conventions involving TupleTableSlots
will be replaced by simply passing and returning these row objects as
pass-by-reference Datums. In the case of functions returning rowtypes,
we'll continue to support the present ReturnSetInfo convention for
returning a separate TupleDesc describing the result type --- but this
will just be a crosscheck.

We will be able to make generic I/O routines for composite types,
comparable to those used now for arrays. Not sure what a convenient
external format would look like. (Possibly use the same conventions as
for a 1-D array?) We will need to make the convention that the type OID
of a composite type is passed to the input routine, in the same way that
an array input routine gets the typelem OID; else the input routine won't
know what to do.

We could also think about allowing functions that are declared as
accepting RECORD (ie, polymorphic-across-row-types functions). They would
use the same methods already used by polymorphic functions to find out the
true types of their inputs. (Might be best to invent a separate
pseudotype, say ANYRECORD, rather than overloading RECORD for this purpose.)

The recently developed SRF API is a bit unfortunate since it exposes the
assumption that a TupleTableSlot must be involved in returning a tuple.
If we don't overlay the Datum header with HeapTupleHeader then I think we
have to make TupleGetDatum copy the passed tuple and insert the row type
info from the slot's tupledesc, which'd be pretty inefficient because it
means making an extra copy of the row data. But if we do overlay the
header fields, then I think we can set up backwards-compatibility
definitions in which the slot is simply ignored. Specifically:

TupleDescGetSlot: no-op, returns NULL
TupleGetDatum: ignore slot, return tuple t_data pointer as datum

This will work because heap_formtuple and BuildTupleFromCStrings can
return a HeapTuple whose t_data part is already a valid row Datum, simply
by setting the appropriate length and type fields in it. (If the tuple is
ever stored to disk as a regular table row, these fields will be
overwritten with xmin/cmin info at that time.)

To convert a row Datum into something that can be passed to heap_getattr,
one could use a local variable of type HeapTupleData and set its t_data
field to the datum's pointer value. t_len is copied from the datum
contents, while the other fields of HeapTupleData can just be set to
zeroes.

ExecEvalVar for a whole-row reference will need to copy the scan tuple so
that it can insert the correct length and tuple type fields. (We cannot
scribble on the tuple as it sits in the disk buffer, of course.)
Fortunately this shouldn't be a major memory leak anymore since the copy
can be made in the current short-lived memory context.

Handling anonymous RECORD types
-------------------------------

I envision expanding typcache.c to be able to store TupleDesc structures
for composite and record types. In the case of regular composite types
this is not especially difficult. For record types, we are essentially
trying to make a backend-local mapping from typmod values to TupleDescs.
There are a couple of interesting points:

* We have to be able to re-use an already-existing cache entry if it
matches a requested TupleDesc. This avoids indefinite growth of the type
cache over many queries. There could still be issues with memory leakage
if a single backend session uses a huge number of distinct record types
over its lifetime, but that doesn't seem likely to be an issue in
practice. (We could avoid this problem by recycling no-longer-needed
cache entries, but what with plan caching I'm not sure there's any
pleasant way to do that. For the moment I intend that cache entries for
record types will live for the life of the backend.)

* Since record typmod values are backend-local, they aren't meaningful in
query structures stored on disk. When a stored rule is read in, we'll
need to be able to replace any embedded typmod values with correct
assignments for the current backend.

Safely storing composite types on disk
--------------------------------------

If a composite row value contains any out-of-line TOAST references, we'd
have to expand those references before we could safely store the value on
disk. This can be handled by the same tuptoaster.c routines that are
already concerned with replacing unsafe references.

ALTER TABLE issues
------------------

If an ALTER TABLE command does something that requires examining or
changing every row of a table, it would presumably have to do the same to
all entries in any composite-type column of the table's rowtype. To avoid
surprises and interesting debates about who has permissions to do this,
it might be wise to restrict on-disk composite columns to be only of
standalone composite types (ie, those made with CREATE TYPE AS). This
restriction would also avoid debates about whether table constraints apply
to composite-type columns.

Notes
-----

While doing this, we should once and for all rip out the last vestiges of
the "attisset" feature.

Add an Assert to ExecEvalVar that checks that whole-row vars (and, I
guess, any system column as well) are fetched from a scan tuple, never the
inner or outer side of a join. If they've not been converted into
ordinary field references in a join, it's too late.

The current API for TypeGetTupleDesc is somewhat bogus --- I don't think
the "column alias" option is really appropriate, and it is lacking a
typmod argument so it can't be used with record types. We shall have to
deprecate it in favor of a new routine.

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2004-03-29 19:25:24 Re: Increasing security in a shared environment ...
Previous Message Dave Page 2004-03-29 19:12:35 Re: Increasing security in a shared environment ...