Re: Streaming base backups

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming base backups
Date: 2011-01-05 22:04:04
Message-ID: AANLkTimV2q4w0jEus_Mwyjp+=0w2syirOohnbYTer8zg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 5, 2011 at 22:58, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr> wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>> Attached is an updated streaming base backup patch, based off the work
>
> Thanks! :)
>
>> * Compression: Do we want to be able to compress the backups server-side? Or
>>   defer that to whenever we get compression in libpq? (you can still tunnel it
>>   through for example SSH to get compression if you want to) My thinking is
>>   defer it.
>
> Compression in libpq would be a nice way to solve it, later.

Yeah, I'm pretty much set on postponing that one.

>> * Compression: We could still implement compression of the tar files in
>>   pg_streamrecv (probably easier, possibly more useful?)
>
> What about pg_streamrecv | gzip > …, which has the big advantage of
> being friendly to *any* compression command line tool, whatever patents
> and licenses?

That's part of what I meant with "easier and more useful".

Right now though, pg_streamrecv will output one tar file for each
tablespace, so you can't get it on stdout. But that can be changed of
course. The easiest step 1 is to just use gzopen() from zlib on the
files and use the same code as now :-)

>> * Stefan mentiond it might be useful to put some
>> posix_fadvise(POSIX_FADV_DONTNEED)
>>   in the process that streams all the files out. Seems useful, as long as that
>>   doesn't kick them out of the cache *completely*, for other backends as well.
>>   Do we know if that is the case?
>
> Maybe have a look at pgfincore to only tag DONTNEED for blocks that are
> not already in SHM?

I think that's way more complex than we want to go here.

>> * include all the necessary WAL files in the backup. This way we could generate
>>   a tar file that would work on it's own - right now, you still need to set up
>>   log archiving (or use streaming repl) to get the remaining logfiles from the
>>   master. This is fine for replication setups, but not for backups.
>>   This would also require us to block recycling of WAL files during the backup,
>>   of course.
>
> Well, I would guess that if you're streaming the WAL files in parallel
> while the base backup is taken, then you're able to have it all without
> archiving setup, and the server could still recycling them.

Yes, this was mostly for the use-case of "getting a single tarfile
that you can actually use to restore from without needing the log
archive at all".

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dimitri Fontaine 2011-01-05 22:05:58 Re: Visual Studio 2010/Windows SDK 7.1 support
Previous Message Dimitri Fontaine 2011-01-05 21:58:28 Re: Streaming base backups