Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?
Date: 2019-01-29 20:23:51
Message-ID: 20190129202351.vqe45bjfbq3osgp6@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On 2019-01-29 11:25:41 -0800, Andres Freund wrote:
> On 2019-01-28 22:37:53 -0500, Tom Lane wrote:
> > Andres Freund <andres(at)anarazel(dot)de> writes:
> > > I did that now. I couldn't reproduce it locally, despite a lot of
> > > runs. Looking at the buildfarm it looks like the failures were,
> > > excluding handfish which failed without recognizable symptoms before and
> > > after, on BSD derived platforms (netbsd, freebsd, OX), which certainly
> > > is interesting.
> >
> > Isn't it now. Something about the BSD scheduler perhaps? But we've
> > got four or five different BSD-ish platforms that reported failures,
> > and it's hard to believe they've all got identical schedulers.
> >
> > That second handfish failure does match the symptoms elsewhere:
> >
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=handfish&dt=2019-01-29%2000%3A20%3A22
> >
> > --- /home/filiperosset/dev/client-code-REL_8/HEAD/pgsql.build/src/interfaces/ecpg/test/expected/thread-thread.stderr 2018-10-30 20:11:45.551967381 -0300
> > +++ /home/filiperosset/dev/client-code-REL_8/HEAD/pgsql.build/src/interfaces/ecpg/test/results/thread-thread.stderr 2019-01-28 22:38:20.614211568 -0200
> > @@ -0,0 +1,20 @@
> > +SQL error: page 0 of relation "test_thread" should be empty but is not on line 125
> >
> > so it's not quite 100% BSD, but certainly the failure rate on BSD is
> > way higher than elsewhere. Puzzling.
>
> Interesting.
>
> While chatting with Robert about this issue I came across the following
> section of code:
>
> /*
> * If the FSM knows nothing of the rel, try the last page before we
> * give up and extend. This avoids one-tuple-per-page syndrome during
> * bootstrapping or in a recently-started system.
> */
> if (targetBlock == InvalidBlockNumber)
> {
> BlockNumber nblocks = RelationGetNumberOfBlocks(relation);
>
> if (nblocks > 0)
> targetBlock = nblocks - 1;
> }
>
>
> I think that explains the issue (albeit not why it is much more frequent
> on BSDs). Because we're not going through the FSM, it's perfectly
> possible to find a page that is uninitialized, *and* is not yet in the
> FSM. The only reason this wasn't previously actively broken, I think, is
> that while we previously *also* looked that page (before the extending
> backend acquired a lock!), when looking at the page
> PageGetHeapFreeSpace(), via PageGetFreeSpace(), decides there's no free
> space because it just interprets the zeroes in pd_upper - pd_lower as no
> free space.
>
> Hm, thinking about what a good solution here could be.

I wonder if we should just expand the logic we have for
RBM_ZERO_AND_LOCK so it can be and use it in hio.c (we probably could
just use it without any changes, but the name seems a bit confusing) -
because that'd prevent the current weirdness that it's possible that the
buffer can be locked by somebody between the ReadBufferBI(P_NEW) and and
the LockBuffer(BUFFER_LOCK_EXCLUSIVE). I think that'd allow us to
alltogether drop the cleanup lock logic we currently have, and also
protect us against the FSM issue I'd outlined upthread?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2019-01-29 20:36:12 Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?
Previous Message Jesper Pedersen 2019-01-29 20:19:28 Re: pg_upgrade: Pass -j down to vacuumdb