Re: Best practice for file storage?

From: Steve Atkins <steve(at)blighty(dot)com>
To: pgsql-general General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Best practice for file storage?
Date: 2010-01-31 17:56:13
Message-ID: 53EE700C-FB3E-478F-A14A-EF093FC084B5@blighty.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


On Jan 31, 2010, at 2:46 AM, Joe Kramer wrote:

> Hi,
>
> I need to store a lot of large files (thousands of 10-100 MB files)
> uploaded through my web application and I find that storing them in
> database as bytea field is not practical for backup purposes.
> My database has full backup performed every 12 hours and backup is
> encrypted and copied to server on another continent. Having all the
> heavy binary data in database will make backups impossible.
>
> So I am thinking of having a table just for metadata and file id
> sequence and storing the file on file system:
>
> CREATE TABLE business_logo
> (
> file_id bigserial NOT NULL,
> file_name varchar,
> file_date timestamp,
> );
>
> Storing file in path composed from serial id, e.g. file with id 2345
> will be stored in
> /webapp/files/2/3/4/5/2345
>
> So I can backup files separately and database backup is still quick
> and painless.
>
> This is very simplistic and straightforward method.
> I suppose there are better ways of doing it, using some virtual file system?
> Anyone had a similar issue with avoiding of storing large files in
> database, how did you solve it?

Works fine, though you probably want more than 10 entries in each directory,
depending on which filesystem you use.

If you're going to delete the files ever, you can do that transactionally by
having a trigger on the table that queues the filenames in a table for
deletion by an external process.

If you can store some information about the file contents - size at least,
maybe a cheap hash (md5) - then you'll find it easier to sanity check
filesystem vs database when something goes wrong. It also lets you
find duplicate files more easily.

Cheers,
Steve

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Magnus Hagander 2010-01-31 17:59:52 Re: Versions RSS page is missing version(s)
Previous Message Sebastien Boisvert 2010-01-31 17:44:49 Problems backing up