Re: Database versus filesystem for storing images

From: Andrew Chernow <pg-job(at)esilo(dot)com>
To: Jorge Godoy <jgodoy(at)gmail(dot)com>
Cc: John McCawley <nospam(at)hardgeus(dot)com>, Clodoaldo <clodoaldo(dot)pinto(dot)neto(at)gmail(dot)com>, imageguy <imageguy1206(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Database versus filesystem for storing images
Date: 2007-01-05 20:49:28
Message-ID: 459EB9D8.1000401@esilo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

>> Don't store your images in the database. Store them on the filesystem and
>> store their path in the database
I 100% agree. Use the database as a lookup into the filesystem. Don't load the
database up with terabytes of non-searchable binary data? not sure how that
would help you?

Here is one idea:

Have a 64-bit sequence that you use to generate an image_id (becomes file name).
Hash that id out over a 3-level deep directory structure that allows 4096
entries per directory. Should give you 64-bit worth of files.

CREATE OR REPLACE FUNCTION get_image_path(image_id BIGINT)
RETURNS TEXT AS $$
DECLARE
-- 40963, avoids "integer out of range"
dir1_val BIGINT := 68719476736;
dir1 BIGINT;
dir2 BIGINT;
dir3 BIGINT;
path TEXT;

BEGIN
dir1 := (image_id / dir1_val) % 4096;
dir2 := (image_id / (4096 * 4096)) % 4096;
dir3 := (image_id / 4096) % 4096;
RETURN '/BASE_PATH/' || dir1 || '/' || dir2 ||
'/' || dir3 || '/' || image_id;
END;
$$ LANGUAGE PLPGSQL;

test=# select get_image_path(200399322222);
get_image_path
-------------------------------------
/BASE_PATH/2/3752/2991/200399322222
(1 row)

>I mean, how do you handle integrity with data
> outside the database?
You don't, the file system handles integrity of the stored data. Although, one
must careful to avoid db and fs orphans. Meaning, a record with no
corresponding file or a file with no corresponging record. Always
write()/insert an image file to the system within a transaction, including
writing the image out to the fs. Make sure to unlink any paritally written
image files.

>>How do you plan your backup routine
In regards to backup, backup the files one-by-one. Grab the lastest image file
refs from the database and start backing up those images. Each successfully
backed up image should be followed by inserting that file's database record into
a remote db server. If anything fails, cleanup the partial image file (to avoid
orphaned data) and rollout the transaction.

just one idea. i'm sure there are other ways of doing it. point is, this is
completely possible to do reliably.

andrew

Jorge Godoy wrote:
> John McCawley <nospam(at)hardgeus(dot)com> writes:
>
>> Don't store your images in the database. Store them on the filesystem and
>> store their path in the database. Anyone that tells you otherwise is a stark
>> raving madman :)
>>
>> My system is very heavily used, and our pg_dump is only a few gigs. Meanwhile
>> our images/documents storage is well over a hundred gigs. I'd hate to think
>> that I'd have to dump and restore 100 gigs every time I wanted to dump the
>> newest data to the development database.
>
> How do you plan your backup routine and how do you guarantee that on a failure
> all needed data is restored? I mean, how do you handle integrity with data
> outside the database?
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message John McCawley 2007-01-05 20:59:11 Re: Database versus filesystem for storing images
Previous Message Jorge Godoy 2007-01-05 20:26:51 Re: Database versus filesystem for storing images