On Mon, Mar 17, 2008 at 2:01 PM, Peter Koczan <pjkoczan(at)gmail(dot)com> wrote:
> > > I am going to embarkon building a music library using apache,
> > > postgresql and php. What is the best way to store the music files?
> >
> > Your options are either to use a BLOB within the database or to store
> > paths to normal files in the file system in the database. I suspect
> > using normal files will make backup and management a great deal easier
> > than using in-database BLOBs, so personally I'd do it that way.
>
> I discussed something like this with some co-workers recently, and
> here's what I had to say. Not all of these apply to the original
> message, but they are things to consider when marrying a database to a
> file storage system.
>
> Storing the files in the database as BLOBs:
> Pros:
> - The files can always be seen by the database system as long as it's
> up (there's no dependence on an external file system).
> - There is one set of locking mechanisms, meaning that the file
> operations can be atomic with the database operations.
> - There is one set of permissions to deal with.
> Cons:
> - There is almost no way to access files outside of the database. If
> the database goes down, you are screwed.
> - If you don't make good use of tablespaces and put blobs on a
> separate disk system, the disk could thrash going between data and
> blobs, affecting performance.
> - There are stricter limits for PostgreSQL blobs (1 GB size limits, I've read).
>
> Storing files externally, storing pathnames in the database:
> Pros:
> - You can access and manage files from outside the database and
> possibly using different interfaces.
> - There's a lot less to store directly in the database.
> - You can use existing file-system permissions, mechanisms, and limits.
> Cons:
> - You are dealing with two storage systems and two different locking
> systems which are unlikely to play nice with each other. Transactions
> are not guaranteed to be atomic (e.g. a database rollback will not
> rollback a file system operation, a commit will not guarantee that
> data in a file will stay).
> - The file system has to be seen by the database system and any remote
> clients that wish to use your application, meaning that a networked FS
> is likely to be used (depending on how many clients you have and how
> you like to separate services), with all the fun that comes from
> administering one of those. Note that this one in particular really
> only applies to enterprise-level installations, not smaller
> installations like the original poster's.
> - If you don't put files on a separate disk-system or networked FS,
> you can get poor performance from the disk thrashing between the
> database and the files.
>
> There are a couple main points:
> 1. The favorite answer in computing, "it depends", applies here. What
> you decide depends on your storage system, your service and
> installation policies, and how important fully atomic transactions are
> to you.
> 2. If you want optimal performance out of either of these basic
> models, you should make proper use of separate disk systems. I have no
> idea which one is faster (it depends, I'm sure) nor do I have much of
> an idea of how to benchmark this properly.
>
> Peter
> It seems to me as such a database gets larger, it will become much harder to manage with the 2 systems. I am talking mostly about music. So each song should not get too large. I have read alot on this list and on other resources and there seems to be leanings toward 1+0 raids for storage. It seems to the most flexible when it comes to speed, redundancy and recovery time. I do want my database to be fully atomic. I think that is important as this database grows. Are my assumptions wrong?