Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "hange-folder>?" <toggle-mailboxes(at)alap3(dot)anarazel(dot)de>
Subject: Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?
Date: 2019-01-29 19:25:41
Message-ID: 20190129192541.pxs5f5w5mxojnny2@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2019-01-28 22:37:53 -0500, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > I did that now. I couldn't reproduce it locally, despite a lot of
> > runs. Looking at the buildfarm it looks like the failures were,
> > excluding handfish which failed without recognizable symptoms before and
> > after, on BSD derived platforms (netbsd, freebsd, OX), which certainly
> > is interesting.
>
> Isn't it now. Something about the BSD scheduler perhaps? But we've
> got four or five different BSD-ish platforms that reported failures,
> and it's hard to believe they've all got identical schedulers.
>
> That second handfish failure does match the symptoms elsewhere:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=handfish&dt=2019-01-29%2000%3A20%3A22
>
> --- /home/filiperosset/dev/client-code-REL_8/HEAD/pgsql.build/src/interfaces/ecpg/test/expected/thread-thread.stderr 2018-10-30 20:11:45.551967381 -0300
> +++ /home/filiperosset/dev/client-code-REL_8/HEAD/pgsql.build/src/interfaces/ecpg/test/results/thread-thread.stderr 2019-01-28 22:38:20.614211568 -0200
> @@ -0,0 +1,20 @@
> +SQL error: page 0 of relation "test_thread" should be empty but is not on line 125
>
> so it's not quite 100% BSD, but certainly the failure rate on BSD is
> way higher than elsewhere. Puzzling.

Interesting.

While chatting with Robert about this issue I came across the following
section of code:

/*
* If the FSM knows nothing of the rel, try the last page before we
* give up and extend. This avoids one-tuple-per-page syndrome during
* bootstrapping or in a recently-started system.
*/
if (targetBlock == InvalidBlockNumber)
{
BlockNumber nblocks = RelationGetNumberOfBlocks(relation);

if (nblocks > 0)
targetBlock = nblocks - 1;
}

I think that explains the issue (albeit not why it is much more frequent
on BSDs). Because we're not going through the FSM, it's perfectly
possible to find a page that is uninitialized, *and* is not yet in the
FSM. The only reason this wasn't previously actively broken, I think, is
that while we previously *also* looked that page (before the extending
backend acquired a lock!), when looking at the page
PageGetHeapFreeSpace(), via PageGetFreeSpace(), decides there's no free
space because it just interprets the zeroes in pd_upper - pd_lower as no
free space.

Hm, thinking about what a good solution here could be.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-01-29 19:28:44 Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)
Previous Message Robert Haas 2019-01-29 18:59:39 Re: ATTACH/DETACH PARTITION CONCURRENTLY