Re: storing an explicit nonce

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Tom Kincaid <tomjohnkincaid(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
Subject: Re: storing an explicit nonce
Date: 2021-05-27 16:15:27
Message-ID: 20210527161527.GH5646@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 27, 2021 at 12:03:00PM -0400, Robert Haas wrote:
> On Thu, May 27, 2021 at 11:19 AM Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > I was asking how decoupling the nonce from the LSN allows for us to
> > avoid full page writes for hint bit changes. I am guessing you are
> > saying that on recovery, if we see a hint-bit-only change in the WAL
> > (with a new nonce), we just throw away the page because it could be torn
> > and use the WAL full page write version.
>
> Well, in the design where the nonce is stored in the page, there is no
> need for every hint-type change to appear in the WAL at all. Once per
> checkpoint cycle, you need to write a full page image, as we do for
> checksums or wal_log_hints. The rest of the time, you can just bump
> the nonce and rewrite the page, same as we do today.

What is it about having the nonce be the LSN that doesn't allow that to
happen? Could we just create a dummy LSN record and assign that to the
page and use that as a nonce.

> > Yes, it might be 1e100+++ more expensive too, but we don't know, and I
> > am not ready to add a lot of complexity for such an unknown.
>
> No, it can't be 1e100+++ more expensive, because it's not
> realistically possible for a page to be written to disk 1e100+++ times
> per checkpoint cycle. It is however entirely possible for it to be
> written 100 times per checkpoint cycle. That is not something unknown
> about which we need to speculate; it is easy to see that this can
> happen, even on a simple test like pgbench with a data set larger than
> shared buffers.

I guess you didn't get my joke on that one. ;-)

> It is not right to confuse "we have no idea whether this will be
> expensive" with "how expensive this will be is workload-dependent,"
> which is what you seem to be doing here. If we had no idea whether
> something would be expensive, then I agree that it might not be worth
> adding complexity for it, or maybe some testing should be done first
> to find out. But if we know for certain that in some workloads
> something can be very expensive, then we had better at least talk
> about whether it is worth adding complexity in order to resolve the
> problem. And that is the situation here.

Sure, but the downsides of avoiding it seem very high to me, not only in
code complexity but in requiring dump/reload or logical replication to
deploy.

> I am not even convinced that storing the nonce in the block is going
> to be more complex, because it seems to me that the patches I posted
> upthread worked out pretty cleanly. There are some things to discuss
> and think about there, for sure, but it is not like we are talking
> about inventing warp drive.

See above.

--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2021-05-27 16:19:11 Re: storing an explicit nonce
Previous Message Andres Freund 2021-05-27 16:13:07 Re: storing an explicit nonce