Re: TOAST, large objects and ACIDity

From: Tomasz Ostrowski <tometzky(at)batory(dot)org(dot)pl>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: TOAST, large objects and ACIDity
Date: 2007-07-10 14:44:54
Message-ID: 20070710144453.GA2118@batory.org.pl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, 10 Jul 2007, Alexander Staubo wrote:

> My take: Stick with TOAST unless you need fast random access. TOAST
> is faster, more consistently supported (eg., in Slony) and easier
> to work with.

Toasted bytea columns have some other disadvantages also:

1.

It is impossible to create its value in chunks - it means that
you'll need to create one very big "insert" with file contents. If
your client library does not support binary arguments to prepared
statements you'll need to escape this data, which makes it several
times bigger (I think it could be 4 times bigger). For parsing and
saving this a server would need several copies of this data (I think
a server would need at least 10 times more memory than binary data
size).

If I'm not mistaken it means that for saving a 20MB data file a
server would need at least 200MB of memory - and this is a huge
amount. Also a client would need over 100MB.

I've worked around this with a temporary table:
create temporary table chunks (
chunk_nr int primary key,
chunk_data bytea not null
);
and an aggregate:
create aggregate bytea_accum(
sfunc = byteacat,
basetype = bytea,
stype = bytea,
initcond = ''
);
I put 1MB chunks into this "chunks" table and then do:
insert into attachments
(filename, filedata)
select
('filename.txt', bytea_accum(chunk_data)
from (
select chunk_data
from chunks
order by chunk_nr
)

I've proposed that it would be easier if there was a memory-efficient
function:
bytea_from_lo(oid lo_id) returns bytea
But there was no feedback.

2.

Also there's the same problem when reading this bytea data. I'm also doing
this in chunks - I've set on this table external storage:
alter table attachments alter column filedata set storage external;
If this is set then a function
substring(filedata from [offset] for [chunk_size])
is efficient.

================

In case of large objects you'll not be able to enforce constraints in
database, for example:
- you will be able to delete lo which is referenced elsewhere;
- you won't be able to limit lo size;
- I think that you'll not be able to limit access to lo;
- you will be able to delete a reference to a lo without
deleting this object (you can use contrib/vacuum_lo for garbage
collecting though).

Regards
Tometzky
--
...although Eating Honey was a very good thing to do, there was a
moment just before you began to eat it which was better than when you
were...
Winnie the Pooh

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Kevin martins 2007-07-10 15:13:50 Problems with linkage
Previous Message Tom Lane 2007-07-10 13:54:53 Re: russian case-insensitive regexp search not working