Re: LogwrtResult contended spinlock

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
Subject: Re: LogwrtResult contended spinlock
Date: 2024-04-05 08:41:01
Message-ID: 202404050841.qajhfn4wkksi@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2024-Apr-05, Bharath Rupireddy wrote:

> 1.
> /*
> * Update local copy of shared XLogCtl->log{Write,Flush}Result
> + *
> + * It's critical that Flush always trails Write, so the order of the reads is
> + * important, as is the barrier.
> */
> #define RefreshXLogWriteResult(_target) \
> do { \
> - _target.Write = XLogCtl->logWriteResult; \
> - _target.Flush = XLogCtl->logFlushResult; \
> + _target.Flush = pg_atomic_read_u64(&XLogCtl->logFlushResult); \
> + pg_read_barrier(); \
> + _target.Write = pg_atomic_read_u64(&XLogCtl->logWriteResult); \
> } while (0)
>
> Is it "Flush always trails Write" or "Flush always leades Write"? I
> guess the latter, no?

Well, Flush cannot lead Write. We cannot Flush what hasn't been
Written! To me, "Flush trails Write" means that flush is behind. That
seems supported by Merriam-Webster[1], which says in defining it as a
transitive verb

3 a : to follow upon the scent or trace of : TRACK
b : to follow in the footsteps of : PURSUE
c : to follow along behind
d : to lag behind (someone, such as a competitor)

Maybe it's not super clear as a term. We could turn it around and say "Write
always leads Flush".

[1] https://www.merriam-webster.com/dictionary/trail

The reason we have the barrier, and the reason we write and read them in
this order, is that we must never read a Flush value that is newer than
the Write value we read. That is: the values are written in pairs
(w1,f1) first, (w2,f2) next, and so on. It's okay if we obtain (w2,f1)
(new write, old flush), but it's not okay if we obtain (w1,f2) (old
write, new flush). If we did, we might end up with a Flush value that's
ahead of Write, violating the invariant.

> 2.
> +
> + pg_atomic_write_u64(&XLogCtl->logWriteResult, LogwrtResult.Write);
> + pg_write_barrier();
> + pg_atomic_write_u64(&XLogCtl->logFlushResult, LogwrtResult.Flush);
> }
>
> Maybe add the reason as to why we had to write logWriteResult first
> and then logFlushResult, similar to the comment atop
> RefreshXLogWriteResult?

Hmm, I had one there and removed it. I'll put something back.

> > For 0002, did you consider having pg_atomic_monotonic_advance_u64()
> > return the currval?
>
> +1 for returning currval here.

Oh yeah, I had that when the monotonic stuff was used by 0001 but lost
it when I moved to 0002. I'll push 0001 now and send an updated 0002
with the return value added.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"La rebeldía es la virtud original del hombre" (Arthur Schopenhauer)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Morris de Oryx 2024-04-05 08:46:40 Re: Looking for an index that supports top-n searches by enforcing a max-n automatically
Previous Message Donghang Lin 2024-04-05 08:20:52 Bad estimation for NOT IN clause with big null fraction