Re: Lowering the default wal_blocksize to 4K

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Ants Aasma <ants(at)cybertec(dot)at>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: Lowering the default wal_blocksize to 4K
Date: 2023-10-12 14:56:37
Message-ID: CA+TgmoZUP6kc9-FwE=BnsZHH17LE4-G3XjvMA2QFOQfYWVOXiQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 12, 2023 at 9:57 AM Ants Aasma <ants(at)cybertec(dot)at> wrote:
> This reminds me that xlp_tli is not being used to its full potential right now either. We only check that it's not going backwards, but there is at least one not very hard to hit way to get postgres to silently replay on the wrong timeline. [1]
>
> [1] https://www.postgresql.org/message-id/CANwKhkMN3QwAcvuDZHb6wsvLRtkweBiYso-KLFykkQVWuQLcOw@mail.gmail.com

Maybe I'm missing something, but that seems mostly unrelated. What
you're discussing there is the server's ability to figure out when it
ought to perform a timeline switch. In other words, the server settles
on the wrong TLI and therefore opens and reads from the wrong
filename. But here, we're talking about the case where the server is
correct about the TLI and LSN and hence opens exactly the right file
on disk, but the contents of the file on disk aren't what they're
supposed to be due to a procedural error.

Said differently, I don't see how anything we could do with xlp_tli
would actually fix the problem discussed in that thread. That can
detect a situation where the TLI of the file doesn't match the TLI of
the pages inside the file, but it doesn't help with the case where the
server decided to read the wrong file in the first place.

But this does make me wonder whether storing xlp_tli and xlp_pageaddr
in every page is really worth the bit-space. That takes 12 bytes plus
any padding it forces us to incur, but the actual entropy content of
those 12 bytes must be quite low. In normal cases probably 7 or so of
those bytes are going to consist entirely of zero bits (TLI < 256,
LSN%8k == 0, LSN < 2^40). We could probably find a way of jumbling
the LSN, TLI, and maybe some other stuff into an 8-byte quantity or
even perhaps a 4-byte quantity that would do about as good a job
catching problems as what we have now (e.g.
LSN_HIGH32^LSN_LOW32^BITREVERSE(TLI)). In the event of a mismatch, the
value actually stored in the page header would be harder for humans to
understand, but I'm not sure that really matters here. Users should
mostly be concerned with whether a WAL file matches the cluster where
they're trying to replay it; forensics on misplaced or corrupted WAL
files should be comparatively rare.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2023-10-12 15:05:30 Re: logical decoding and replication of sequences, take 2
Previous Message Tom Lane 2023-10-12 14:54:06 Re: PostgreSQL domains and NOT NULL constraint