Skip site navigation (1) Skip section navigation (2)

Re: What is the best way to storage music files in Postgresql

From: "Peter Koczan" <pjkoczan(at)gmail(dot)com>
To: "Craig Ringer" <craig(at)postnewspapers(dot)com(dot)au>
Cc: Rich <rhdyes(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: What is the best way to storage music files in Postgresql
Date: 2008-03-17 18:01:06
Message-ID: 4544e0330803171101g680ffa5bj8ac32908a9f3dfb5@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-performance
>  > I am going to embarkon building a music library using apache,
>  > postgresql and php.  What is the best way to store the music files?
>
>  Your options are either to use a BLOB within the database or to store
>  paths to normal files in the file system in the database. I suspect
>  using normal files will make backup and management a great deal easier
>  than using in-database BLOBs, so personally I'd do it that way.

I discussed something like this with some co-workers recently, and
here's what I had to say. Not all of these apply to the original
message, but they are things to consider when marrying a database to a
file storage system.

Storing the files in the database as BLOBs:
Pros:
- The files can always be seen by the database system as long as it's
up (there's no dependence on an external file system).
- There is one set of locking mechanisms, meaning that the file
operations can be atomic with the database operations.
- There is one set of permissions to deal with.
Cons:
- There is almost no way to access files outside of the database. If
the database goes down, you are screwed.
- If you don't make good use of tablespaces and put blobs on a
separate disk system, the disk could thrash going between data and
blobs, affecting performance.
- There are stricter limits for PostgreSQL blobs (1 GB size limits, I've read).

Storing files externally, storing pathnames in the database:
Pros:
- You can access and manage files from outside the database and
possibly using different interfaces.
- There's a lot less to store directly in the database.
- You can use existing file-system permissions, mechanisms, and limits.
Cons:
- You are dealing with two storage systems and two different locking
systems which are unlikely to play nice with each other. Transactions
are not guaranteed to be atomic (e.g. a database rollback will not
rollback a file system operation, a commit will not guarantee that
data in a file will stay).
- The file system has to be seen by the database system and any remote
clients that wish to use your application, meaning that a networked FS
is likely to be used (depending on how many clients you have and how
you like to separate services), with all the fun that comes from
administering one of those. Note that this one in particular really
only applies to enterprise-level installations, not smaller
installations like the original poster's.
- If you don't put files on a separate disk-system or networked FS,
you can get poor performance from the disk thrashing between the
database and the files.

There are a couple main points:
1. The favorite answer in computing, "it depends", applies here. What
you decide depends on your storage system, your service and
installation policies, and how important fully atomic transactions are
to you.
2. If you want optimal performance out of either of these basic
models, you should make proper use of separate disk systems. I have no
idea which one is faster (it depends, I'm sure) nor do I have much of
an idea of how to benchmark this properly.

Peter

In response to

Responses

pgsql-performance by date

Next:From: Dave CramerDate: 2008-03-17 18:12:03
Subject: Re: Benchmark: Dell/Perc 6, 8 disk RAID 10
Previous:From: Franck RoutierDate: 2008-03-17 16:37:42
Subject: Re: performance tools

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group