RE: Enhance traceability of wal_level changes for backup management

From: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>
To: 'Stephen Frost' <sfrost(at)snowman(dot)net>
Cc: 'Masahiko Sawada' <sawada(dot)mshk(at)gmail(dot)com>, "'pgsql-hackers(at)lists(dot)postgresql(dot)org'" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Enhance traceability of wal_level changes for backup management
Date: 2021-01-07 08:23:07
Message-ID: OSBPR01MB488856ED9726BFE94F93597FEDAF0@OSBPR01MB4888.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Stephen

Thank you so much for replying !
On Thursday, January 7, 2021 2:40 AM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> * osumi(dot)takamichi(at)fujitsu(dot)com (osumi(dot)takamichi(at)fujitsu(dot)com) wrote:
> > You said
> > > The use case I imagined is that the user temporarily changes
> > > wal_level to 'none' from 'replica' or 'logical' to speed up loading
> > > and changes back to the normal. In this case, the backups taken
> > > before the wal_level change cannot be used to restore the database
> > > to the point after the wal_level change. So I think backup
> > > management tools would want to recognize the time or LSN when/where
> > > wal_level is changed to ‘none’ in order to do some actions such as
> > > invalidating older backups, re-calculating backup redundancy etc.
> > > Actually the same is true when the user changes to ‘minimal’. The
> > > tools would need to recognize the time or LSN in this case too.
> > > Since temporarily changing wal_level has been an uncommon use case
> > > some tools perhaps are not aware of that yet. But since introducing
> > > wal_level = ’none’ could make the change common, I think we need to
> provide a way for the tools.
>
> I continue to be against the idea of introducing another wal_level. If there's
> additional things we can do to reduce WAL traffic while we continue to use it to
> keep the system in a consistent state then we should implement those for the
> 'minimal' case and be done with it.
The new wal_level=none patch achieves something that cannot be done
or cannot be implemented together with wal_level='minimal' clearly.
Did you have a look at the peformance evaluation result that I conducted in [1] ?
It proved that data loading of 'none' is much faster than that of 'minimal'.

> Adding another wal_level is just going to be confusing and increase complexity,
> and the chances that someone will end up in a bad state.
Even if when we committed another idea,
that is "ALTER TABLE tbl SET UNLOGGED/LOGGED without copying relation data",
the complexity like taking a full backup before bulk data loading didn't change
and when any accidents happened during no wal logging for specific table with the improvement,
user would need to start from the backup again. This looks same to me.
Additionally, the patch itself in that thread is big and more complicated.
The complexity you meant is the wal_level's impact to backup management tools or anything else ?

> > I wondered, couldn't backup management tools utilize the information
> > in the backup that is needed to be made when wal_level is changed to "none"
> for example ?
>
> What information is that, exactly? If there's a way to detect that the wal_level
> has been flipped to 'minimal' and then back to 'replica', other than scanning the
> WAL, we'd certainly like to hear of it, so we can implement logic in pgbackrest
> to detect that happening. I'll admit that I've not gone hunting for such of late,
> but I don't recall seeing anything that would change that either.
Excuse me for making you confused.
I was thinking about control file in the backup as information.

I'm not familiar with the internals of those backup management tools
but do they monitor the control file and its values of the runnning server at short intervals ?
And, if they don't do so and if we want accurate time or LSN that indicates wal_level changes,
I thought we could pick up exact information from control file of cluster directory or its backup
during server stop (because changing wal_level requires server stop).
That's all. Sorry for the noise.

> The idea proposed above about having the LSN and time recorded when a
> wal_level change to minimal happens, presumably in pg_control, seems like it
> could work as a way to allow external tools to more easily figure out if the
> wal_level's been changed to minimal. Perhaps there's a reason to track
> changes between replica and logical but I can't think of any offhand and backup
> tools would still need to know if the wal level was set to *minimal*
> independently of changes between replica and logical.
Here, probably we use different assumptions.
What you say makes sense if we commit neither the patch of
ALTER TABLE SET UNLOGGED/LOGGED without copy (during wal_level=minimal)
nor the patch of new wal_level 'none' ?
We were talking about a case that fast bulk data loading is needed
and the user changes wal_level to 'none' or 'minimal' temporarily
from higher level in order to speed up data loading and make it back to the higher level again.
After this operation, we need to invalidate the old backups taken before the data loading.
In this case, we have to track the change to 'minimal'. Didn't it make sense ?

> Then again, once we add support for scanning the WAL to pgbackrest, we'll
> almost certainly track it that way since that'll work for older and released
> versions of PG, so I'm not really sure it's worth it to add this to pg_control
> unless there's other reasons to.
>
> > As I said before, existing backup management tools support only
> > wal_level=replica or logical at present. And, if they would wish to
> > alter the status quo and want to cover the changes over wal_levels, I
> > felt it's natural that they support feature like taking a full backup, trigged by
> the wal_level changes (or server stop).
>
> Sure, but there needs to be a way to actually do that..
>
> > This is because taking a backup is a must for wal_level=none, as I
> > described in the patch of wal_level=none.
> > For example, they could prepare an easy way to take an offline
> > physical backup when the server stops for changing the wal_level.
> > (Here, they can support the change to minimal, too.)
>
> pgbackrest does support offline physical backups and it's pretty easy (just pass
> in --no-online). That doesn't really help with the issue of detecting a wal_level
> change though.
Thanks for the tip,

[1] - https://www.postgresql.org/message-id/OSBPR01MB48887ECC140A97494C542264EDF40%40OSBPR01MB4888.jpnprd01.prod.outlook.com

Best Regards,
Takamichi Osumi

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2021-01-07 08:24:22 Re: [HACKERS] [PATCH] Generic type subscripting
Previous Message Bharath Rupireddy 2021-01-07 08:21:40 Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit