Re: storing binary files / memory limit

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: andrew(at)catalyst(dot)net(dot)nz
Cc: pgsql-php(at)postgresql(dot)org
Subject: Re: storing binary files / memory limit
Date: 2008-10-11 21:41:00
Message-ID: 48F11D6C.1000209@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-php

> On Sat, 2008-10-11 at 18:26 +0200, Tomas Vondra wrote:
>> Is there any other way to solve storing of large files in PostgreSQL?
>
> No, not until there are functions that let you fopen() on the bytea
> column.

:-(

> Also, your "... || more_column" solution will generate large numbers of
> dead rows and require frequent vacuuming.

Yes, I'm aware of this and it's one more deficiency of the solution. But
inserting the data is not so frequent (a few inserts a day), especially
when compared to reading (several hundred SELECTs).

>> - Optimization is a serious criterion, as is reliability.
>
> If you're using tables with very large columns, make sure you index on
> every other column you're going to access it by. If PostgreSQL has to
> resort to full-table scans on this table, and especially with a low
> memory constraint, you could easily end up with it doing an on-disk sort
> on a copy of the data.

Well, the schema is quite well optimized I believe, so this shouldn't be
a problem. All the necessary indexes are created etc. so the queries are
executed quite fast (we have a comprehensive performance logging of
queries, so I'm quite sure about this).

But I'm not sure what you mean by 'low memory contraint' - the memory
limit I've been talking about is purely PHP feature, so it's related to
inserting / reading and escaping / unescaping data.

Anyway 99% of queries returning multiple rows do not return BYTEA
columns - these columns are references in queries returning single row,
so there is no problem with sorting / memory occupied by the postmaster
process.

> If you *have* to store it in a table column (and it really isn't the
> most efficient way of doing it) then create a separate table for it
> which is just SERIAL + data.

I *want* to store it in a table column, because I'm not able to come up
with a better solution. As I understand it, creating a separate table to
store the binary data is almost the same as using plain bytea columns.
We don't have problems with performance (thanks to creating proper
indexes and TOAST architecture), and it does not solve the problem I've
described in my original post (hitting the PHP memory limit).

OK, it would save a little bit of space when using the 'append' strategy
described in my previous post (data = data || new_data), but table
usually consists of a small amount of metadata plus large amount of
binary data. So the amount of space wasted because of storing metadata
in dead rows is negligible compared to space wasted because of dead rows
and bytea columns.

For example, one of our tables is used to store documents, so it has
about this structure

Documents (
id SERIAL,
created DATE,
title VARCHAR(128),
author VARCHAR(128),
description TEXT, -- short (usually less than 2000 characters)
data BYTEA
)

The 'data' column may have even several megabytes, so the metadata
occupies less than 1% of the row.

If you know a better way to store binary data, please describe it here.
The only other way I'm aware of is LOB - it solves the problem of
inserting data (by streaming), but has other disadvantages (no
referential integrity, etc.)

But creating a separate table for the binary data looks interesting
under one condition - the file will be stored splitted. This basically
mimics the LOB storage (pg_largeobject table). It still does not have
the support for streaming the data, but it solves the problem with PHP
memory limit and does not create large amount of dead rows.

regards
Tomas

In response to

Responses

Browse pgsql-php by date

  From Date Subject
Next Message Andrew McMillan 2008-10-12 07:25:07 Re: storing binary files / memory limit
Previous Message Andrew McMillan 2008-10-11 20:23:39 Re: storing binary files / memory limit