Re: Raw device I/O for large objects

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Georgi Chulkov <godji(at)metapenguin(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Raw device I/O for large objects
Date: 2007-09-18 14:46:34
Message-ID: 4153.1190126794@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Georgi Chulkov <godji(at)metapenguin(dot)org> writes:
> Here's the reason why I'm looking at raw device storage for large objects only
> (as opposed to all tables): with raw device I/O I can control, to an extent,
> spatial locality. So, if I have an application that wants to store N large
> objects (totaling several gigabytes) and read them back in some order that is
> well-known in advance, I could store my large objects in that order on the
> raw device.* Sequentially reading them back would then be very efficient.
> With a file system underneath, I don't have that freedom. (Such a scenario
> occurs with raster databases, for example.)

Not sure I buy that argument. If you have loaded these large objects in
the desired order, then the data will be consecutively located in
pg_largeobject, and if the underlying filesystem is at all sane about
where it extends a growing file, the data will be pretty much
consecutive on disk too. You could probably get marginal improvements
by cutting out the middleman but I'm not sure there's reason to think
there'd be spectacular improvements.

> Please allow me to ask then:
> 1. In your opinion, would the above scenario indeed benefit from a raw-device
> interface for large objects?

I don't say it wouldn't benefit. What I'm questioning is the size of
the benefit compared to the amount of work required to get it.
"Supporting raw I/O" is not some trivial bit of work --- you essentially
have to reimplement your own filesystem, because like it or not you
*do* have to think about space management. If we went in this direction
we'd be buying into a lot of work, not to mention a lot of ongoing
portability headaches. So far no one's been able to make a case that
it's worth that level of effort.

> 2. How feasible it is to decouple general table storage from large object
> storage?

You might try digging into the original POSTGRES sources --- at one time
there were several different large-object APIs. I'm not sure if they
exposed them just as different sets of access functions or if there was
something more elegant. My own feeling though is that you probably
don't want to go that way, because with outside-the-database storage you
lose transactional behavior (unless you're up for reinventing that
wheel too). I'd try replacing md.c, or maybe resurrecting smgr.c as
something that can really switch between more than one underlying
storage manager.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2007-09-18 14:49:12 Re: Open issues for HOT patch
Previous Message Tom Lane 2007-09-18 14:17:59 Re: Open issues for HOT patch