Re: block-level incremental backup

From: Jeevan Ladhe <jeevan(dot)ladhe(at)enterprisedb(dot)com>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Robert Haas <robertmhaas(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: block-level incremental backup
Date: 2019-07-26 05:51:43
Message-ID: CAOgcT0NBP3ifH52H8R-2TMQL=wVbHGZv7SBVVKLQDcxi5T++6g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Vignesh,

Please find my comments inline below:

1) If relation file has changed due to truncate or vacuum.
> During incremental backup the new files will be copied.
> There are chances that both the old file and new file
> will be present. I'm not sure if cleaning up of the
> old file is handled.
>

When an incremental backup is taken it either copies the file in its
entirety if
a file is changed more than 90%, or writes .partial with changed blocks
bitmap
and actual data. For the files that are unchanged, it writes 0 bytes and
still
creates a .partial file for unchanged files too. This means there is a
.partitial
file for all the files that are to be looked up in full backup.
While composing a synthetic backup from incremental backup the
pg_combinebackup
tool will only look for those relation files in full(parent) backup which
are
having .partial files in the incremental backup. So, if vacuum/truncate
happened
between full and incremental backup, then the incremental backup image will
not
have a 0-length .partial file for that relation, and so the synthetic backup
that is restored using pg_combinebackup will not have that file as well.

> 2) Just a small thought on building the bitmap,
> can the bitmap be built and maintained as
> and when the changes are happening in the system.
> If we are building the bitmap while doing the incremental backup,
> Scanning through each file might take more time.
> This can be a configurable parameter, the system can run
> without capturing this information by default, but if there are some
> of them who will be taking incremental backup frequently this
> configuration can be enabled which should track the modified blocks.

IIUC, this will need changes in the backend. Honestly, I think backup is a
maintenance task and hampering the backend for this does not look like a
good
idea. But, having said that even if we have to provide this as a switch for
some
of the users, it will need a different infrastructure than what we are
building
here for constructing bitmap, where we scan all the files one by one. Maybe
for
the initial version, we can go with the current proposal that Robert has
suggested,
and add this switch at a later point as an enhancement.
- My thoughts.

Regards,
Jeevan Ladhe

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2019-07-26 05:53:26 Re: Warning messages appearing twice
Previous Message Dilip Kumar 2019-07-26 05:43:36 Re: POC: Cleaning up orphaned files using undo logs