Re: pg_basebackup vs. Windows and tablespaces

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Dilip kumar <dilip(dot)kumar(at)huawei(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_basebackup vs. Windows and tablespaces
Date: 2014-12-16 07:28:14
Message-ID: CAA4eK1LykWQCiWvdpGg2iZJ4dZEeB3sbKFFvdLN3-bAJjL4bxg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Dec 14, 2014 at 11:54 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:
> On Sat, Dec 13, 2014 at 10:48 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> > > On 11/20/2014 02:27 AM, Amit Kapila wrote:
> > >> Now the part where I would like to receive feedback before revising
the
> > >> patch is on the coding style. It seems to me from Tom's comments
that
> > >> he is not happy with the code, now I am not sure which part of the
patch
> > >> he thinks needs change. Tom if possible, could you be slightly more
> > >> specific about your concern w.r.t code?
> >
> > > In view of the request above for comments from Tom, I have moved this
> > > back to "Needs Review".

I have updated the patch and handled the review comments as below:

1. Change the name of file containing tablespace path information to
tablespace_map. I have changed the reference to file name in whole patch.

2. I have not added tablespace name in tablespace_map file as I am not
sure how important it is for user readability aspect and what format should
we use and another point is not many people have asked for it. However
if you feel it is important to have the same for this patch, then I will
propose some new format.

3. Made the code generic (for all platforms) such that a tablespace_map
file will be created to restore tablespaces for base backup.

> > Sorry, I was not paying very close attention to this thread and missed
> > the request for comments. A few such:
> >
> > 1. The patch is completely naive about what might be in the symlink
> > path string; eg embedded spaces in the path would break it. On at
> > least some platforms, newlines could be in the path as well. I'm not
> > sure about how to guard against this while maintaining human readability
> > of the file.
> >
>
> I will look into this and see what best can be done.
>

I have chosen #3 (Make pg_basebackup check for and fail on symlinks
containing characters (currently newline only) it can't handle) from the
different options suggested by Tom. This keeps the format same as
previous and human readable.

> > 2. There seems to be more going on here than what is advertised, eg
> > why do we need to add an option to the BASE_BACKUP command
>
> This is to ensure that symlink file is generated only for tar format;
> server is not aware of whether the backup is generated for plain format
> or tar format. We don't want to do it for plain format as for that
> client (pg_basebackup) can update the symlinks via -T option and backing
> up symlink file during that operation can lead to spurious symlinks after
> archive recovery. I have given the reason why we want to accomplish it
> only for tar format in my initial mail.
>
> > (and if
> > we do need it, doesn't it need to be documented in protocol.sgml)?
>
> I shall take care of it in next version of patch.
>

Added the description in protocol.sgml

> > And why is the RelationCacheInitFileRemove call relocated?
> >
>
> Because it assumes that tablespace directory pg_tblspc is in
> place and it tries to remove the files by reading pg_tblspc
> directory as well. Now as we setup the symlinks in pg_tblspc
> after reading symlink file, so we should remove relcache init
> file once the symlinks are setup in pg_tblspc directory.
>
> > 3. Not terribly happy with the changes made to the API of
> > do_pg_start_backup, eg having to be able to parse "DIR *" in its
> > arguments seems like a lot of #include creep. xlog.h is pretty
> > central so I'm not happy about plastering more #includes in it.
> >
>
> The reason of adding new include in xlog.c is for use of tablespaceinfo
> structure which I have now kept in basebackup.h.
>
> The reason why I have done this way is because do_pg_start_backup has
> some functionality common to both non-exclusive and exclusive backups and
> for this feature we have to do some work common for both non-exclusive
> and exclusive backup which is to generate the symlink label file for
> non-exclusive backups and write the symlink label file for exclusive
> backups using that information. Doing this way seems right to me
> as we are already doing something like that for backup label file.
>
> Another possible way could be to write a new function in xlogutils.c
> to do the symlink label stuff and then use the same in xlog.c, I think
> that way we could avoid any new include in xlog.c. However for this we
> need to have include in xlogutils.c to make it aware of tablespaceinfo
> structure.
>

Are you okay with the alternative I have suggested to avoid the
new include in xlog.c or do you feel the alternative will make the
code worse than the current patch?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
extend_basebackup_to_include_symlink_v5.patch application/octet-stream 35.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2014-12-16 07:33:51 Re: Commitfest problems
Previous Message Peter Geoghegan 2014-12-16 07:06:01 Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}