Re: Btrfs clone WIP patch

From: Jonathan Rogers <jrogers(at)socialserve(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Btrfs clone WIP patch
Date: 2013-02-13 22:45:46
Message-ID: 511C179A.4010901@socialserve.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> Jonathan Rogers <jrogers(at)socialserve(dot)com> writes:
>> This patch against PostgreSQL 9.1.8 takes advantage of efficient file
>> cloning on Linux Btrfs file systems to make CREATE DATABASE operations
>> extremely fast regardless of the size of the database used as a
>> template.
>
> It would be easier to review this patch if the bulk of it weren't simple
> reindentation of existing code. (Or at least it ought to be that ---
> I object to your having moved the buffer palloc inside the loop. A
> patch that is trying to optimize a minority case can expect to be
> rejected if it makes things worse for everyone else.)

The buffer allocation is actually not inside the loop, but inside the if
branch for ordinary copying behavior since the buffer is unnecessary in
the case of a successful clone.

>
> Consider whether you can't phrase the patch to avoid that, perhaps by
> use of "continue" instead of an else-block. Alternatively, enclose the
> existing code in braces but don't reindent it, ie,
>
> + if (whatever)
> + ... new code ...
> + else
> + {
> ... existing code ...
> + }
>

Indeed, I was bothered by the need to reindent so much as well. I'll see
if I can do better.

> The next pgindent run will fix the funny indentation, or the committer
> can do it if he wishes after reviewing.
>
>> The efficient cloning is accomplished by a Btrfs-specific ioctl() call.
>
> The big-picture question of course is whether we want to carry and
> maintain a filesystem-specific hack. I don't have a sense that btrfs
> is so widely used as to justify this.

Yes, this is a problem I considered. I think the basic problem is the
lack of any kind of generic interface to copy or clone a file. A system
call for Linux to copy or clone has been proposed more than once but so
far, nothing has been accepted. I believe there are a few file systems
that support some kind of efficient cloning, but I haven't investigated
it deeply.

>
>> +#ifdef __linux__
>> +# define BTRFS_IOCTL_MAGIC 0x94
>> +# define BTRFS_IOC_CLONE _IOW (BTRFS_IOCTL_MAGIC, 9, int)
>> + return ioctl (dest_fd, BTRFS_IOC_CLONE, src_fd);
>> +#else
>
> This seems to me to be unacceptable on its face. If we can't get these
> constants out of a system header file, it's unlikely that the feature is
> stable enough to depend on, if indeed it's meant for general-purpose use
> at all. We could easily end up invoking unexpected behaviors.

Of course you're right that defining values right there is no good. It
looks like the values are in the Linux headers since 2.6.32 when Btrfs
was merged into mainline. I guess I'll need to brush up on CPP to figure
out how to use the Linux header values if they exist.

Would it be better to move clone_file() into its own module where
implementations for other file system types might eventually be added?

My first implementation called cp with the "--reflink=auto" option since
that seems to be the closest thing to a file system agnostic interface.
The above snippet comes directly from the GNU cp source and I'm not sure
why that code defines the values instead of taking them from Linux headers.

--
Jonathan Ross Rogers

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2013-02-13 22:48:23 Re: Btrfs clone WIP patch
Previous Message Tom Lane 2013-02-13 22:13:24 Re: Btrfs clone WIP patch