TODO item: teach pg_dump about sparsely-stored large objects

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: TODO item: teach pg_dump about sparsely-stored large objects
Date: 2012-10-09 02:40:51
Message-ID: 18789.1349750451@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The backend code for large objects goes to some lengths to be
intelligent about sparsely-written blobs: if you seek out to the middle
of nowhere and write a few bytes, you don't end up allocating space in
pg_largeobject for all the byte positions you skipped over. However,
pg_dump knows nothing of this. If you pg_dump a sparsely-stored large
object, it will tediously transfer all those nonexistent zeros from
server to client, and write them into the resulting archive. And then
when you restore, the blob isn't sparse anymore ... those zeroes become
non-virtual on the database side too.

Admittedly, this is no different than what happens when you try to back
up a sparsely-stored Unix file, at least with simpler backup tools.
But it seems to me we should try a bit harder.

There are a couple of stumbling blocks to making that happen:

* How should pg_dump find out where there are holes? It would be easy
if it were to look into pg_largeobject, but that would destroy the
ability to use pg_dump as non-superuser. I think we'd really have to
provide some API to read from a blob in a sparse-storage-aware manner.
The first idea that comes to mind is some way to tell lo_read to stop
reading when it hits a gap (instead of manufacturing zeroes) and then
a new "whence" option for lo_lseek that tells it to seek to the next
non-dummy data in the blob.

* How do we get pg_dump to make use of the knowledge once it's got it?
The current code in that area is a masterpiece of ugly unreadability;
it's near impossible to tell what connects to what else, and there are
assorted magic switches that completely change the behavior of major
interface functions. I'd kind of want to rewrite the whole mess before
trying to change its behavior. I'm also pretty certain that we'd need
an archive format change, though we've certainly done those before so
that isn't a fatal objection.

I'm not planning to do anything about this myself, but if someone is
looking for a project, here's one.

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2012-10-09 04:00:49 Re: Visual Studio 2012 RC
Previous Message Joachim Wieland 2012-10-09 01:38:17 Re: Add FET to Default and Europe.txt