Re: What is the best way to storage music files in Postgresql

From: Rich <rhdyes(at)gmail(dot)com>
To: "Peter Koczan" <pjkoczan(at)gmail(dot)com>
Cc: "Craig Ringer" <craig(at)postnewspapers(dot)com(dot)au>, pgsql-performance(at)postgresql(dot)org
Subject: Re: What is the best way to storage music files in Postgresql
Date: 2008-03-17 18:39:32
Message-ID: 8fb205ea0803171139h63b55fdrf62776851583a74d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, Mar 17, 2008 at 2:01 PM, Peter Koczan <pjkoczan(at)gmail(dot)com> wrote:
> > > I am going to embarkon building a music library using apache,
> > > postgresql and php. What is the best way to store the music files?
> >
> > Your options are either to use a BLOB within the database or to store
> > paths to normal files in the file system in the database. I suspect
> > using normal files will make backup and management a great deal easier
> > than using in-database BLOBs, so personally I'd do it that way.
>
> I discussed something like this with some co-workers recently, and
> here's what I had to say. Not all of these apply to the original
> message, but they are things to consider when marrying a database to a
> file storage system.
>
> Storing the files in the database as BLOBs:
> Pros:
> - The files can always be seen by the database system as long as it's
> up (there's no dependence on an external file system).
> - There is one set of locking mechanisms, meaning that the file
> operations can be atomic with the database operations.
> - There is one set of permissions to deal with.
> Cons:
> - There is almost no way to access files outside of the database. If
> the database goes down, you are screwed.
> - If you don't make good use of tablespaces and put blobs on a
> separate disk system, the disk could thrash going between data and
> blobs, affecting performance.
> - There are stricter limits for PostgreSQL blobs (1 GB size limits, I've read).
>
> Storing files externally, storing pathnames in the database:
> Pros:
> - You can access and manage files from outside the database and
> possibly using different interfaces.
> - There's a lot less to store directly in the database.
> - You can use existing file-system permissions, mechanisms, and limits.
> Cons:
> - You are dealing with two storage systems and two different locking
> systems which are unlikely to play nice with each other. Transactions
> are not guaranteed to be atomic (e.g. a database rollback will not
> rollback a file system operation, a commit will not guarantee that
> data in a file will stay).
> - The file system has to be seen by the database system and any remote
> clients that wish to use your application, meaning that a networked FS
> is likely to be used (depending on how many clients you have and how
> you like to separate services), with all the fun that comes from
> administering one of those. Note that this one in particular really
> only applies to enterprise-level installations, not smaller
> installations like the original poster's.
> - If you don't put files on a separate disk-system or networked FS,
> you can get poor performance from the disk thrashing between the
> database and the files.
>
> There are a couple main points:
> 1. The favorite answer in computing, "it depends", applies here. What
> you decide depends on your storage system, your service and
> installation policies, and how important fully atomic transactions are
> to you.
> 2. If you want optimal performance out of either of these basic
> models, you should make proper use of separate disk systems. I have no
> idea which one is faster (it depends, I'm sure) nor do I have much of
> an idea of how to benchmark this properly.
>
> Peter
> It seems to me as such a database gets larger, it will become much harder to manage with the 2 systems. I am talking mostly about music. So each song should not get too large. I have read alot on this list and on other resources and there seems to be leanings toward 1+0 raids for storage. It seems to the most flexible when it comes to speed, redundancy and recovery time. I do want my database to be fully atomic. I think that is important as this database grows. Are my assumptions wrong?

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Rich 2008-03-17 18:40:05 Re: What is the best way to storage music files in Postgresql
Previous Message Craig James 2008-03-17 18:33:28 Re: Benchmark: Dell/Perc 6, 8 disk RAID 10