Re: [PATCH] [v8.5] Security checks on largeobjects

From: KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Joshua Tolley <eggyknap(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bernd Helmle <mailings(at)oopsware(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] [v8.5] Security checks on largeobjects
Date: 2009-07-31 07:19:46
Message-ID: 4A729B12.8010904@ak.jp.nec.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Nobody may remember, I also proposed a patch to support access
controls on largeobject.

It was suggested that largeobject stores its contents on the TOAST
relations and provides interfaces to read/write them partially.

For example:
> == lo_xxx() interfaces ==
>
> A new version of loread() and lowrite() are necessary to access
> a part of toasted data within user defined tables. It can be defined
> as follows:
>
> loread(Blob data, int32 offset, int32 length)
> lowrite(Blob data, int32 offset, Bytea data)

But, I found an another problem when we tried to update TOAST chunks
partially.

Because the toast pointer contains the total size of a series of
chunks, not only chunk_id and toastrelid, we cannot expand the size
of TOAST'ed data.

See below, toast pointer is defined as follows.

struct varatt_external
{
int32 va_rawsize; /* Original data size (includes header) */
int32 va_extsize; /* External saved size (doesn't) */
Oid va_valueid; /* Unique ID of value within TOAST table */
Oid va_toastrelid; /* RelID of TOAST table containing it */
};

These fields are inlined within the main tuple, and the contents
of TOAST datum is stored in the toast relation in separation.

If we can update the toast data partially with expanding its total size
(e.g, write 10,000 bytes from the offset 6,000 of the 12,000 bytes verlena),
it is also necessary to update the main tuple, because its va_rawsize and
va_extsize should be also updates, although va_valueid and va_toastrelid
can keep as is.

I wonder why the va_extsize and va_rawsize should be included within the
toast pointer. It is fundamentally computable using a pair of va_valueid
and va_toastrelid.
(I guess it is due to the performance reason. If we need to check whether
it is compressed or not, a flag varible can provide enough hint.)

So, we need to consider more how to implement partial writable verlena
on the user tables.

However, as far as existing largeobject interfaces, I believe it is possible
to implement them using partial writable verlena.
Because largeobject identifier is obvious when we user lo_*() interfaces,
so we can update the main tuple within pg_largeobject also, if the total
length of the verlena was changed on partial writes.

Thus, I would like to implement the feature with focusing on the existing
largeobject interfaces at the first step.
Partial writable TOAST in the user relation will come on the next.
(It also enables to keep the patch size smaller than all-in-a-patch.)

In summary, I try to implement it on the next commit fest.
- GRANT / REVOKE on largeobjects
- DAC access controls on largeobejcts
- It puts the contents of largeobject on TOAST relation.
(including hole support)
- It provides largeobejct interfaces to write it partially.

Thanks,

KaiGai Kohei wrote:
> I summarized the design proposal and issues currently we have.
>
> I would like to see any comments corresponding to the proposition.
> Especially, selection of the snapshot is a headach issue for me.
>
> ----------------
> This project tries to solve two items listed at:
> http://wiki.postgresql.org/wiki/Todo#Binary_Data
>
> * Add security checks for large objects
> * Allow read/write into TOAST values like large objects
>
> = Introduction =
>
> We need to associate a metadata for a certain largeobject to
> implement security checks for largeobjects. However, the data
> structure of largeobjects are not suitable to manage its
> metadata (such as owner identifier, database acls ...) on
> a certain largeobject, because a largeobject is stored as
> separated page frames in the pg_largeobject system catalog.
> Thus, we need to revise the data structure to manage a certain
> largeobject.
>
> An interesting fact is a similarity of data structure between
> TOAST table and pg_lageobject.
> A TOAST relation is declared as follows:
> pg_toast_%u (
> chunk_id oid,
> chunk_seq int4,
> chunk_data bytea,
> unique(chunk_id, chunk_seq)
> )
>
> Definition of the pg_largeobject is as follows:
> pg_largeobject(
> loid oid,
> pageno int4,
> data bytea,
> unique(loid, pageno)
> )
>
> They have an identical data structure, so it is quite natural
> to utilize TOAST mechanism to store pagef rames of largeobject.
>
> = Design =
>
> In my plan, role of the pg_largeobject will be changed to
> manage metadata of largeobjects, and it will be redefined
> as follows:
>
> CATALOG(pg_largeobject,2613)
> {
> Oid loowner; /* OID of the owner */
> Oid lonsp; /* OID of the namespace */
> aclitem loacl[1]; /* ACL of the largeobject */
> Blob lodata; /* Contents of the largeobject */
> } FormData_pg_largeobejct;
>
> For access controls purpose, its ownership, namespace and ACLs
> are necessary. In addition, the Blob is a new type which support
> to read/write a its partial TOAST value.
>
> The current lo_xxx() interfaces will perform as a wrapper function
> to access a certain pg_largeobject.lodata identified by a largeobject
> handler. The loread(), lowrite() or similar interfaces will support
> partial accesses on the Blob type. It enables user defined relation
> to contain large data using TOAST mechanism, with reasonable resource
> comsumption. (Note that TOAST replaces whole of the chunks with same
> identifier, even if it changes just a single byte.)
>
> = Interfaces =
>
> == New type ==
>
> We need a new variable length type that has the following feature,
> to allow users partial accesses.
>
> * It always use external TOAST table, independent from its size.
> If toasted data is stored as inline, we cannot update it independent
> from the main table.
> It does not prevent partial read, but meaningless because inlined
> data is enough small.
>
> * It always store the data without any compression.
> We cannot easily compute required data offset on the compressed
> data. All the toasted data need to be uncompressed, for both of
> reader and writer access.
>
> == lo_xxx() interfaces ==
>
> A new version of loread() and lowrite() are necessary to access
> a part of toasted data within user defined tables. It can be defined
> as follows:
>
> loread(Blob data, int32 offset, int32 length)
> lowrite(Blob data, int32 offset, Bytea data)
>
> == GRANT/REVOKE ==
>
> When we access traditional largeobjects, reader permission (SELECT)
> or writer permission (UPDATE) should be checked on accesses.
>
> The GRANT/REVOKE statements are enhanced as follows:
>
> GRANT SELECT ON LARGE OBJECT 1234 TO kaigai;
>
> It allows "kaigai" to read the largeobject: 1234.
>
> = Issues =
>
> == New pg_type.typstorage ==
>
> The variable length data is always necessary to be stored in external
> storage and uncompressed. The existing typstorage does not satisfies
> the requirement, so we need to add a new pg_type.typstorage strategy.
>
> The new typstorage strategy forces:
> - It always stores the given varlena data on external toast relation.
> - It always stores the given varlena data without any compression.
>
> It will give us performance loss, so existing Text or Bytea will be
> more suitable to store variable length data being not very large.
>
> == Snapshot ==
>
> The largeobject interface uses SnapshotNow for writable accesses, and
> GetActiveSnapshot() for read-only accesses, but toast_fetch_datum()
> uses SnapshotToast to scan the toast relation.
>
> It seems to me SnapshotToast depends on an assumption that tuples
> within TOAST relation does not have any multiple versions.
> When we update a toast value, TOAST mechanism inserts whole of
> variable length datum with a new chunk_id, and older chunks are
> removed at toast_delete_datum().
> The TOAST pointer is updated to the new chunk_id, and its visibility
> is under MVCC controls.
>
> The source code comments at HeapTupleSatisfiesToast() says as follows:
>
> /*
> * HeapTupleSatisfiesToast
> * True iff heap tuple is valid as a TOAST row.
> *
> * This is a simplified version that only checks for VACUUM moving conditions.
> * It's appropriate for TOAST usage because TOAST really doesn't want to do
> * its own time qual checks; if you can see the main table row that contains
> * a TOAST reference, you should be able to see the TOASTed value. However,
> * vacuuming a TOAST table is independent of the main table, and in case such
> * a vacuum fails partway through, we'd better do this much checking.
> *
> * Among other things, this means you can't do UPDATEs of rows in a TOAST
> * table.
> */
>
> If largeobjects-like interface is available to update a part of TOAST
> values, we cannot keep the assumption.
>
> At the beginning, I have a plan to apply the result od GetActiveSnapshot()
> to fetch toasted value. Is there any matter which can be expected?
>

--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Boszormenyi Zoltan 2009-07-31 09:42:33 Re: ECPG support for struct in INTO list
Previous Message Pavel Stehule 2009-07-31 06:57:39 Re: PATCH: make plpgsql IN args mutable (v1)