Re: Hooks to track changed pages for backup purposes

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hooks to track changed pages for backup purposes
Date: 2017-09-07 20:53:17
Message-ID: 38baac3f-facb-5ead-c5f9-12fedc8e2b7d@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

Hi,

On 09/01/2017 08:13 AM, Andrey Borodin wrote:
> Thank you for your reply, Michael! Your comments are valuable,
especially in the world of backups.
>
>> 31 авг. 2017 г., в 19:44, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
написал(а):
>> Such things are not Postgres-C like.
> Will be fixed.
>

A few more comments:

* The patch defines wal_switch_hook, but it's never called.

* I see there are conditions like this:

if(xlogreader->blocks[nblock].forknum == MAIN_FORKNUM)

Why is it enough to restrict the block-tracking code to main fork?
Aren't we interested in all relation forks?

>> I don't understand what xlog_begin_insert_hook() is good for.
> memset control structures and array of blocknos and relfilenodes of
the size XLR_MAX_BLOCK_ID .
>

Why can't that happen in the other hooks? I guess you'll have to explain
what the implementation of the hooks is supposed to do, and why these
locations for hook calls are the right ones. It's damn impossible to
validate the patch without that information.

Assuming you still plan to use the hook approach ...

>> There
>> are no arguments fed to this hook, so modules would not be able to
>> analyze things in this context, except shared memory and process
>> state?
>
>>
>> Those hooks are put in hot code paths, and could impact performance of
>> WAL insertion itself.
> I do not think sending few bytes to cached array is comparable to disk
write of XLog record. Checking the func ptr is even cheaper with correct
branch prediction.
>

That seems somewhat suspicious, for two reasons. Firstly, I believe we
only insert the XLOG records into WAL buffer here, so why should there
be any disk write related? Or do you mean the final commit?

But more importantly, doesn't this kind of information require some
durability guarantees? I mean, if it gets lost during server crashes or
restarts, doesn't that mean the incremental backups might miss some
buffers? I'd guess the hooks will have to do some sort of I/O, to
achieve that, no?

In any case, claims like this probably deserve some benchmarking.

>> So you basically move the cost of scanning WAL
>> segments for those blocks from any backup solution to the WAL
>> insertion itself. Really, wouldn't it be more simple to let for
>> example the archiver process to create this meta-data if you just want
>> to take faster backups with a set of segments? Even better, you could
>> do a scan after archiving N segments, and then use M jobs to do this
>> work more quickly. (A set of background workers could do this job as
>> well).
> I like the idea of doing this during archiving. It is different
trade-off between performance of OLTP and performance of backuping.
Essentially, it is parsing WAL some time before doing backup. The best
thing about it is usage of CPUs that are usually spinning in idle loop
on backup machines.
>
>> In the backup/restore world, backups can be allowed to be taken at a
>> slow pace, what matters is to be able to restore them quickly.
> Backups are taken much more often than restored.
>

I agree. The ability to take backups quickly (and often) is just as
important as fast recovery.

>> In short, anything moving performance from an external backup code path
>> to a critical backend code path looks like a bad design to begin with.
>> So I am dubious that what you are proposing here is a good idea.
> I will think about it more. This proposal takes vanishingly small part
of backend performance, but, indeed, nonzero part.
>

But I think this is not necessarily just about code paths - moving it to
the archiver or a bunch of background workers may easily have just as
negative effect (or event worse), as it's using the same CPUs, has to
actually re-read the WAL segments from disk (which may be a lot of I/O,
particularly when done from multiple processes).

From this POV, the idea to collect this information on the backup system
(WAL archive) by pre-processing the arriving WAL segments seems like the
most promising. It moves the work to another system, the backup system
can make it as durable as the WAL segments, etc.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2017-09-07 21:08:14 Re: Remove 1MB size limit in tsvector
Previous Message Dmitry Dolgov 2017-09-07 20:49:54 Re: [PATCH] Generic type subscripting