Re: Lowering the default wal_blocksize to 4K

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: Lowering the default wal_blocksize to 4K
Date: 2023-10-11 20:05:02
Message-ID: CA+TgmoafCpmzyEPyetE3ejy7DV0qWnWh9KvOLgkXk2x=zGD_0w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 10, 2023 at 7:29 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > Hmm. I don't think we should remove those checks, as I can see people
> > that would want to change their XLog block size with e.g.
> > pg_reset_wal.
>
> I don't think that's something we need to address in every physical
> segment. For one, there's no option to do so. But more importantly, if they
> don't change the xlog block size, we'll just accept random WAL as well. If
> somebody goes to the trouble of writing a custom tool, they can live with the
> consequences of that potentially causing breakage. Particularly if the checks
> wouldn't meaningfully prevent that anyway.

I'm extremely confused about what both of you are saying.

Matthias is referring to pg_reset_wal, which I assume means
pg_resetwal. But it has no option to change the WAL block size. It
does have an option to change the WAL segment size, but that's not the
same thing. And even if pg_resetwal did have an option to change the
WAL segment size, it removes all WAL from pg_wal when it runs, so you
wouldn't normally end up trying to replay WAL from before the
operation because it would have been removed. You might still have
those files around in an archive or something, but the primary doesn't
replay from the archive. You might have standbys, but I would assume
they would have to be rebuilt after changing the WAL block size on the
master, unless you were trying to follow some probably-too-clever
procedure to avoid a standby rebuild. So I'm really kind of lost as to
what the scenario is that Matthias has in mind.

But Andres's response doesn't make any sense to me either. What in the
world does "if they don't change the xlog block size, we'll just
accept random WAL as well" mean? Neither having or not having a check
that the block size hasn't change causes us to "just accept random
WAL". To "accept random WAL," we'd have to remove all of the sanity
checks, which nobody is proposing and nobody would accept.

But if we do want to keep those cross-checks, why not take what Thomas
proposed a little further and move all of xlp_sysid, xlp_seg_size, and
xlp_xlog_blcksz into XLOG_CHECKPOINT_REDO? Then long and short page
headers would become identical. We'd lose the ability to recheck those
values for every new segment, but it seems quite unlikely that any of
these values would change in the middle of replay. If they did, would
xl_prev and xl_crc be sufficient to catch that? I think Andres says in
a later email that they would be, and I think I'm inclined to agree.
False xl_prev matches don't seem especially unlikely, but xl_crc seems
like it should be effective.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2023-10-11 20:27:33 Re: Lowering the default wal_blocksize to 4K
Previous Message Peter Geoghegan 2023-10-11 20:00:44 Re: interval_ops shall stop using btequalimage (deduplication)