Re: block-level incremental backup

From: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>
To: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: block-level incremental backup
Date: 2019-07-17 05:21:51
Message-ID: CAM2+6=VbhEw3ZKCP15gXe=t1jrb1q_vxBJU9sgBa_Xs0qBi2vQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 11, 2019 at 5:00 PM Jeevan Chalke <
jeevan(dot)chalke(at)enterprisedb(dot)com> wrote:

> Hi Anastasia,
>
> On Wed, Jul 10, 2019 at 11:47 PM Anastasia Lubennikova <
> a(dot)lubennikova(at)postgrespro(dot)ru> wrote:
>
>> 23.04.2019 14:08, Anastasia Lubennikova wrote:
>> > I'm volunteering to write a draft patch or, more likely, set of
>> > patches, which
>> > will allow us to discuss the subject in more detail.
>> > And to do that I wish we agree on the API and data format (at least
>> > broadly).
>> > Looking forward to hearing your thoughts.
>>
>> Though the previous discussion stalled,
>> I still hope that we could agree on basic points such as a map file
>> format and protocol extension,
>> which is necessary to start implementing the feature.
>>
>
> It's great that you too come up with the PoC patch. I didn't look at your
> changes in much details but we at EnterpriseDB too working on this feature
> and started implementing it.
>
> Attached series of patches I had so far... (which needed further
> optimization and adjustments though)
>
> Here is the overall design (as proposed by Robert) we are trying to
> implement:
>
> 1. Extend the BASE_BACKUP command that can be used with replication
> connections. Add a new [ LSN 'lsn' ] option.
>
> 2. Extend pg_basebackup with a new --lsn=LSN option that causes it to send
> the option added to the server in #1.
>
> Here are the implementation details when we have a valid LSN
>
> sendFile() in basebackup.c is the function which mostly does the thing for
> us. If the filename looks like a relation file, then we'll need to consider
> sending only a partial file. The way to do that is probably:
>
> A. Read the whole file into memory.
>
> B. Check the LSN of each block. Build a bitmap indicating which blocks
> have an LSN greater than or equal to the threshold LSN.
>
> C. If more than 90% of the bits in the bitmap are set, send the whole file
> just as if this were a full backup. This 90% is a constant now; we might
> make it a GUC later.
>
> D. Otherwise, send a file with .partial added to the name. The .partial
> file contains an indication of which blocks were changed at the beginning,
> followed by the data blocks. It also includes a checksum/CRC.
> Currently, a .partial file format looks like:
> - start with a 4-byte magic number
> - then store a 4-byte CRC covering the header
> - then a 4-byte count of the number of blocks included in the file
> - then the block numbers, each as a 4-byte quantity
> - then the data blocks
>
>
> We are also working on combining these incremental back-ups with the full
> backup and for that, we are planning to add a new utility called
> pg_combinebackup. Will post the details on that later once we have on the
> same page for taking backup.
>

For combining a full backup with one or more incremental backup, we are
adding
a new utility called pg_combinebackup in src/bin.

Here is the overall design as proposed by Robert.

pg_combinebackup starts from the LAST backup specified and work backward. It
must NOT start with the full backup and work forward. This is important both
for reasons of efficiency and of correctness. For example, if you start by
copying over the full backup and then later apply the incremental backups on
top of it then you'll copy data and later end up overwriting it or removing
it. Any files that are leftover at the end that aren't in the final
incremental backup even as .partial files need to be removed, or the result
is
wrong. We should aim for a system where every block in the output directory
is
written exactly once and nothing ever has to be created and then removed.

To make that work, we should start by examining the final incremental
backup.
We should proceed with one file at a time. For each file:

1. If the complete file is present in the incremental backup, then just
copy it
to the output directory - and move on to the next file.

2. Otherwise, we have a .partial file. Work backward through the backup
chain
until we find a complete version of the file. That might happen when we get
\back to the full backup at the start of the chain, but it might also happen
sooner - at which point we do not need to and should not look at earlier
backups for that file. During this phase, we should read only the HEADER of
each .partial file, building a map of which blocks we're ultimately going to
need to read from each backup. We can also compute the offset within each
file
where that block is stored at this stage, again using the header
information.

3. Now, we can write the output file - reading each block in turn from the
correct backup and writing it to the write output file, using the map we
constructed in the previous step. We should probably keep all of the input
files open over steps 2 and 3 and then close them at the end because
repeatedly closing and opening them is going to be expensive. When that's
done,
go on to the next file and start over at step 1.

We are already started working on this design.

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2019-07-17 05:24:17 Re: psql ctrl+f skips displaying of one record and displays skipping one line
Previous Message Andres Freund 2019-07-17 05:21:34 Re: buildfarm's typedefs list has gone completely nutso