RE: Disable WAL logging to speed up data loading

From: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
To: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, 'Stephen Frost' <sfrost(at)snowman(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "ashutosh(dot)bapat(dot)oss(at)gmail(dot)com" <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Disable WAL logging to speed up data loading
Date: 2020-11-19 03:45:24
Message-ID: TYAPR01MB2990E16CB5BE414C681314D5FEE00@TYAPR01MB2990.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: Osumi, Takamichi/大墨 昂道 <osumi(dot)takamichi(at)fujitsu(dot)com>
> there were some ideas to trace the change of wal_level,
> in other words, *stronger mechanism* to check wal_level.
> I agree with the idea to have a new monitoring item
> and would like to implement those kind of, or one of those ideas for the next
> patch.
> But I'm not sure where is the right place to write the information.
> Was the best place the control file ?

Yes, I think so (I find nowhere else.)

> On Tuesday, Nov 3, 2020 3:02 AM Stephen Frost <sfrost(at)snowman(dot)net>
> wrote:
> > Checking the WAL level certainly seems critical to anything that's reading the
> > WAL. We certainly do this already when running as a
> > replica:
> >
> > ereport(WARNING,
> > (errmsg("WAL was generated with wal_level=minimal, data may be
> > missing"),
> > errhint("This happens if you temporarily set wal_level=minimal
> > without taking a new base backup.")));
> >
> > There's definitely a question about if a WARNING there is really sufficient or
> > not, considering that you could end up with 'logged'
> > tables on the replica that are missing data, but I'm not sure that inventing a
> > new, independent, mechanism for checking WAL level changes makes
> sense.
> >
> > While pgbackrest backups of a primary wouldn't be impacted, it does
> support
> > backing up from a replica (as do other backup systems, of
> > course) and if data that's supposed to be on the replica isn't there because
> > someone restarted PG with wal_level=minimal and then flipped it back to
> > replica and got the replica to move past that part of the WAL by turning off hot
> > standby, replaying, and then turning it back on, then the backup is going to
> > also be missing that data.
> >
> > Perhaps that's where we need to have a stronger mechanism though- that is,
> > if we hit the above WARNING, ever, set a flag somewhere that tools can check
> > and which we could also check and throw further warnings about. In other
> > words, every time this replica is started, we could check for this flag and
> > throw the same warning above about how data may be missing, and we could
> > have pg_basebackup flat out refuse to back up from a replica where this flag
> > has been set (maybe with some kind of override mechanism... maybe not).
> > After all, that's where the real issue here is, isn't it?
> The first idea is warning that means replica could miss some data.
> This is to notify some tools that taking a backup from replica is dangerous.
> This can be applied to a change from replica to minimal(or none) and useful
> because having the wal_level lowered to minimal does not mean
> unclean shutdown or unexpected stoppage but it means a replica might get
> corrupted.

I don't know why WARNING was chosen. I think it should be FATAL, resulting in the standby shutdown, disabling restarting it, and urging the user to rebuild the standby. (I guess that's overreaction because the user may not perform operations that lack WAL while wal_level is minimal.)

> On Tuesday, Nov 3, 2020 12:28 AM Robert Haas <robertmhaas(at)gmail(dot)com>
> wrote:
> > I'm not exactly sure what that would look like, but suppose we had a feature
> > where every time wal_level drops below replica, a counter gets incremented
> by
> > 1, and that counter is saved in the control file. Or maybe when wal_level
> drops
> > below minimal to none. Or maybe there are two counters. Anyway, the idea is
> > that if you have a snapshot of the cluster at one time and a snapshot at
> > another time, you can see whether anything scary has happened in the
> middle
> > without needing all of the WAL in between.
> The second idea is incremental counter that indicates drop of wal_level
> from replica to minimal (or like from minimal to none).
> Its purpose is to compare the wal_level changes between snapshots.
> When any monitoring tools detect any difference of the counter,
> we can predict something happened immediately without checking WAL in
> between.
>
> > > This would also be something that should be exposed as monitoring
> > > points (which it could be if it's in pg_control). That is, I can
> > > imagine a *lot* of installations that would definitely want an alert
> > > to fire if the cluster has ever been started up in a wal_level=none or
> > > wal_level=minimal, at least up until the point where somebody has run
> > > a new full backup.
> >
> > I agree with the general idea that it'd be good for anything we do here to be
> > made available for monitoring tools to be able to see.
> The last one is notification of permanent damage from initial cluster parameter,
> although this may be similar to the second one.
> This is an indication that shows some part of data are not recovered,
> if the cluster was initialized and loaded some data during wal_level=none.
>
> Exposing the monitoring item for tools sounds reasonable for me of course.
> Was the control file the best place for the information to implement those kind
> of ideas ?
> What I'm concerned about is that those three items are not necessarily
> meaningful for all users.
> Also, it seems better the size of the control file should be saved
> because the size is limited to less than 512 bytes for atomic writes, which is
> written in a comment.
> Here, I didn't say that I'll take all of those ideas into the patch
> but this perspective doesn't change whenever we change the content of control
> file, I would say.
> Another aspect unique to the last idea is that
> the value will be no use after taking a new full backup.
> Thus, I'm not sure if writing that kind of information in the control file is right or
> not.

Let's depict the situation. I may be misunderstanding, so any correction would be much welcome. Here, I call the new field wal_level_change_counter, which should be changed to a proper name.

1. wal_level = replica. Take a base backup and store it in $BACKUPDIR/20201119/.
wal_level_change_counter = 0

2. Set wal_level = minimal or none, and restart the instance. Perform some operations.
wal_level_change_counter = 1

3. Set wal_level = replica, and restart the instance.
wal_level_change_counter = 1

4. Some monitoring system compares the values of wal_level_change_counter in $BACKUPDIR/20201119/ and $PGDATA/, and notices the difference (0 and 1 respectively.)
It warns the user that he/she should take a full backup because some WAL may be missing to recover the latest data from the last backup in $BACKUPDIR/20201119/.

But I think this is a separate patch, because the issue already applies to wal_level = minimal.

Regards
Takayuki Tsunakawa

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2020-11-19 03:49:05 Re: Optimising latch signals
Previous Message Michael Paquier 2020-11-19 03:39:18 Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)