Re: Status of server side Large Object support?

From: Thomas Hallgren <thhal(at)mailblocks(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Joe Conway <mail(at)joeconway(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, bryan(at)bulten(dot)ca
Subject: Re: Status of server side Large Object support?
Date: 2004-11-29 17:50:17
Message-ID: thhal-03fmGAs0xcC4caeCxbEL4Y/wlzeFACD@mailblocks.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom,
Here's an attempt to do some major rethinking and solve all open issues
(and of course creating some new ones).

The idea is based on use of normal tables with a bytea column that
stores one LO-page per row (henceforth referred to as LO-page tables).
Several such tables can be used in order to overcome the table size
limit and to enable distribution of LO's over different tablespaces.
Also, the use of normal tables will resolve the issues concerning
protection, user-accessible locking, and the current lack of MVCC.

I feel that a more simplistic approach using already present mechanisms
would make the design easier to maintain. There's no reason why LO's
should require special attention from a vacuum or dump/restore
perspective, nor why it should be excluded from the MVCC.

This is what I think is needed:

A new composite datatype must be used in place of todays oid to identify
a large object. The type will look something like this:

CREATE TYPE lo_locator AS (
lo_page_table int,
lo_id int,
lo_xact_id int
);

The lo_page_table will hold the Oid of the associated LO-page table. The
lo_id is the Oid used by the data pages within that table. The
lo_xact_id is set to the current transaction id each time a data page is
changed. Its purpose is to resolve the concurrency issue that arise when
several transactions simultaniously change the same LO but on different
data pages.

I suggest that the system have a way to set a default LO-page table on a
per schema basis. This table could be used unless the user (schema
owner) explicitly declares another table. If no table has been declared
for a schema the default should be the table declared for 'public'. If
no table is declared there either, some global default can be used.

Among other things, a default LO-page table will make it possible to
retain backward compatibility.

Protection can be obtained using normal grant/revoke permissions on the
LO-page tables. I.e. they will serve as permission groups. LO's
requiering specific permissions must be stored in a separate LO-page table.

The LargeObjectDesc is changed as follows:
- It must have an additional Oid that appoints the table it makes use of.
- The uint32 used for the offset can be changed to an int64 at the same
time.
- The current SubTransactionId will become obsolete since all changes
made to the LO-page tables are under sub-transaction control anyway.
- Something to quickly find our way back to the row containing the
lo_locator must be added so that it's easy to update the lo_xact_id that
resides there. I'm not sure how to do that in the most efficient manner
so its represented by a comment here. Please fill in :-)

Thus we'd get:

typedef struct LargeObjectDesc
{
Oid pageTableId; /* Page-table in use for this LO */
Oid id; /* LO's identifier within LO-page table */
int64 offset; /* current seek pointer */
int flags; /* locking info, etc */
/* + something that enables us to find our way
* back so that the lo_xact_id can be updated
* effiently */
} LargeObjectDesc;

Tables hosting LO pages must be created using the following declaration:

CREATE TABLE <table name>
(
lo_id oid NOT NULL,
lo_pageno int NOT NULL,
lo_data bytea,
PRIMARY KEY (lo_id, lo_pageno)
);

Two restricions concerning a LO-pages table:
1. Each row (page) must be considered fixed in size.
2. Normal (I mean through SQL) access to the LO-page tables must be
discuraged somehow.

The lo_<xxx> protocoll needs to change so that the lo_seek and lo_tell
uses 64 bit quantities. The lo_creat, lo_open, and lo_drop will all act
on the default LO-page table. A new set of functions that allow the
LO-page table to be explicitly stated for the create, open, and drop
operations will be needed. Finally, three new functions, lo_size(int
lod), lo_truncate(int lod, int64 new_size), and lo_get_page_table_id(int
lod) should be added.

Comments, suggestions?

Regards,
Thomas Hallgren

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2004-11-29 18:01:11 Re: unnest
Previous Message Tom Lane 2004-11-29 17:44:42 Re: Stopgap solution for table-size-estimate updatingproblem