Re: Storing many big files in database- should I do it?

From: Adrian von Bidder <avbidder(at)fortytwo(dot)ch>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Storing many big files in database- should I do it?
Date: 2010-04-27 10:43:28
Message-ID: 201004271243.35493@fortytwo.ch
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tuesday 27 April 2010 11.17:42 Cédric Villemain wrote:
> > Anyone had this kind of design problem and how did you solve it?
>
> store your files in a filesystem, and keep the path to the file (plus
> metadata, acl, etc...) in database.

... and be careful that db and file storage do not go out of sync.

But if files are ever only added and possibly removed (but never changed),
this is not too hard:

* be sure to commit db transaction only after file has been written to disk
(use fsync or similar to be sure!) (For file deletions: first delete db
metadata, then delete the file.)
* be sure to detect failed writes and abort the db transaction or otherwise
properly handle errors while storing the file.
* occasionally run a clean-up to remove files that were written to
filesystem where the db metadata was not stored. Should be a rare case but
it probably will happen.

PostgreSQL support 2PC (PREPARE and then COMMIT as separate steps); you may
want to use this (PREPARE database transaction, then do filesystem
operations. If filessystem operation fails, you cann ROLLBACK the db
connection, otherwise COMMIT.) That way, you don't lose transactional
semantics.

Backup requires some more thought. I guess you could use some kind of
volume management to get filesysstem snapshots, but you have to be sure the
fs snapshot reflects the point in time when the database backup was made.
Depending on load / availability requirements you may get away with stopping
data modification at the application level for a few seconds until the db
backup has started and the filesystem snapshot has been created.

cheers
-- vbi

--
featured product: PostgreSQL - http://postgresql.org

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Raymond O'Donnell 2010-04-27 10:49:54 Re: pg_hba.conf
Previous Message jkunkel 2010-04-27 10:42:59 pg_hba.conf