Re: Tracking of page changes for backup purposes. PTRACK [POC]

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Tracking of page changes for backup purposes. PTRACK [POC]
Date: 2017-12-27 11:07:35
Message-ID: CABUevExGBqCkKQj2QpGhZEq_=uPombqBiLfsxWcggcbQ4xu5Ng@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 21, 2017 at 1:51 AM, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
wrote:

> On Thu, Dec 21, 2017 at 7:35 AM, Robert Haas <robertmhaas(at)gmail(dot)com>
> wrote:
> > On Wed, Dec 20, 2017 at 3:45 PM, Tomas Vondra
> > <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> >>> Isn't more effective hold this info in Postgres than in backup sw?
> >>> Then any backup sw can use this implementation.
> >>
> >> I don't think it means it can't be implemented in Postgres, but does it
> >> need to be done in backend?
> >>
> >> For example, it might be a command-line tool similar to pg_waldump,
> >> which processes WAL segments and outputs list of modified blocks,
> >> possibly with the matching LSN. Or perhaps something like pg_receivewal,
> >> doing that in streaming mode.
> >>
> >> This part of the solution can still be part of PostgreSQL codebase, and
> >> the rest has to be part of backup solution anyway.
> >
> > I agree with all of that.
>
> +1. This summarizes a bunch of concerns about all kinds of backend
> implementations proposed. Scanning for a list of blocks modified via
> streaming gives more availability, but knowing that you will need to
> switch to a new segment anyway when finishing a backup, does it really
> matter? Doing it once a segment has finished would be cheap enough,
> and you can even do it in parallel with a range of segments.
>

There's definitely a lot of value to that, in particular the being able to
do out entirely outside the backend making it possible to extract the data
from an existing log archive.

Just to throw another option out there, it could also be implemented at
least partially as a walsender command. That way you can get it out through
a replication connection and piggyback on things like replication slots to
make sure you have the data you need, without having to send the full
volume of data. And it would make it possible to do
incremental/differential *without* having a WAL archive in the first place.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message james 2017-12-27 11:13:04 Re: Postgres with pthread
Previous Message Maksim Milyutin 2017-12-27 10:30:35 Re: Using ProcSignal to get memory context stats from a running backend