Re: Hooks to track changed pages for backup purposes

From: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hooks to track changed pages for backup purposes
Date: 2017-09-01 06:13:49
Message-ID: DD60016B-D2AA-4ACB-8A0B-7AFDBF7C2F69@yandex-team.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thank you for your reply, Michael! Your comments are valuable, especially in the world of backups.

> 31 авг. 2017 г., в 19:44, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> написал(а):
> Such things are not Postgres-C like.
Will be fixed.

> I don't understand what xlog_begin_insert_hook() is good for.
memset control structures and array of blocknos and relfilenodes of the size XLR_MAX_BLOCK_ID .

> There
> are no arguments fed to this hook, so modules would not be able to
> analyze things in this context, except shared memory and process
> state?

>
> Those hooks are put in hot code paths, and could impact performance of
> WAL insertion itself.
I do not think sending few bytes to cached array is comparable to disk write of XLog record. Checking the func ptr is even cheaper with correct branch prediction.

> So you basically move the cost of scanning WAL
> segments for those blocks from any backup solution to the WAL
> insertion itself. Really, wouldn't it be more simple to let for
> example the archiver process to create this meta-data if you just want
> to take faster backups with a set of segments? Even better, you could
> do a scan after archiving N segments, and then use M jobs to do this
> work more quickly. (A set of background workers could do this job as
> well).
I like the idea of doing this during archiving. It is different trade-off between performance of OLTP and performance of backuping. Essentially, it is parsing WAL some time before doing backup. The best thing about it is usage of CPUs that are usually spinning in idle loop on backup machines.

> In the backup/restore world, backups can be allowed to be taken at a
> slow pace, what matters is to be able to restore them quickly.
Backups are taken much more often than restored.

> In short, anything moving performance from an external backup code path
> to a critical backend code path looks like a bad design to begin with.
> So I am dubious that what you are proposing here is a good idea.
I will think about it more. This proposal takes vanishingly small part of backend performance, but, indeed, nonzero part.

Again, thank you for your time and comments.

Best regards, Andrey Borodin.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2017-09-01 06:25:16 Re: Surjective functional indexes
Previous Message Simon Riggs 2017-09-01 06:05:46 Re: [bug fix] Savepoint-related statements terminates connection