Re: Streaming base backups

From: Garick Hamlin <ghamlin(at)isc(dot)upenn(dot)edu>
To: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming base backups
Date: 2011-01-07 15:26:29
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jan 06, 2011 at 07:47:39PM -0500, Cédric Villemain wrote:
> 2011/1/5 Magnus Hagander <magnus(at)hagander(dot)net>:
> > On Wed, Jan 5, 2011 at 22:58, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr> wrote:
> >> Magnus Hagander <magnus(at)hagander(dot)net> writes:
> >>> * Stefan mentiond it might be useful to put some
> >>> posix_fadvise(POSIX_FADV_DONTNEED)
> >>>   in the process that streams all the files out. Seems useful, as long as that
> >>>   doesn't kick them out of the cache *completely*, for other backends as well.
> >>>   Do we know if that is the case?
> >>
> >> Maybe have a look at pgfincore to only tag DONTNEED for blocks that are
> >> not already in SHM?
> >
> > I think that's way more complex than we want to go here.
> >
> DONTNEED will remove the block from OS buffer everytime.
> It should not be that hard to implement a snapshot(it needs mincore())
> and to restore previous state. I don't know how basebackup is
> performed perhaps I am wrong.
> posix_fadvise support is already in postgresql core...we can start by
> just doing a snapshot of the files before starting, or at some point
> in the basebackup, it will need only 256kB per GB of data...

It is actually possible to be more scalable than the simple solution you
outline here (although that solution works pretty well).

I've written a program that syncronizes the OS cache state using
mmap()/mincore() between two computers. It haven't actually tested its
impact on performance yet, but I was surprised by how fast it actually runs
and how compact cache maps can be.

If one encodes the data so one remembers the number of zeros between 1s
one, storage scale by the amount of memory in each size rather than the
dataset size. I actually played with doing that, then doing huffman
encoding of that. I get around 1.2-1.3 bits / page of _physical memory_
on my tests.

I don't have my notes handy, but here are some numbers from memory...

The obvious worst cases are 1 bit per page of _dataset_ or 19 bits per page
of physical memory in the machine. The latter limit get better, however,
since there are < 1024 symbols possible for the encoder (since in this
case symbols are spans of zeros that need to fit in a file that is 1 GB in
size). So is actually real worst case is much closer to 1 bit per page of
the dataset or ~10 bits per page of physical memory. The real performance
I see with huffman is more like 1.3 bits per page of physical memory. All the
encoding decoding is actually very fast. zlib would actually compress even
better than huffman, but huffman encoder/decoder is actually pretty good and
very straightforward code.

I would like to integrate something like this into PG or perhaps even into
something like rsync, but its was written as proof of concept and I haven't
had time work on it recently.


> --
> Cédric Villemain               2ndQuadrant
>     PostgreSQL : Expertise, Formation et Support
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Garick Hamlin 2011-01-07 15:47:46 Re: Streaming base backups
Previous Message Tom Lane 2011-01-07 14:57:44 Re: pgsql: New system view pg_stat_replication displays activity of wal sen