Re: dsa_allocate() faliure

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jakub Glapa <jakub(dot)glapa(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Fabio Isabettini <fisabettini(at)voipfuture(dot)com>, Arne Roland <A(dot)Roland(at)index(dot)de>, Sand Stone <sand(dot)m(dot)stone(at)gmail(dot)com>, Rick Otten <rottenwindfish(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: dsa_allocate() faliure
Date: 2019-02-10 22:45:07
Message-ID: CAEepm=1C3t0B9yXDFtNgPDS0c--RZjDQuaCpFCaCaFUbPb6AFQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Sun, Feb 10, 2019 at 5:41 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Sun, Feb 10, 2019 at 2:37 AM Thomas Munro
> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> > But at first glance it shouldn't be allocating pages, because it just
> > does consolidation to try to convert to singleton format, and then it
> > does recycle list cleanup using soft=true so that no allocation of
> > btree pages should occur.
>
> I think I see what's happening. At the moment the problem occurs,
> there is no btree - there is only a singleton range. So
> FreePageManagerInternal() takes the fpm->btree_depth == 0 branch and
> then ends up in the section with the comment /* Not contiguous; we
> need to initialize the btree. */. And that section, sadly, does not
> respect the 'soft' flag, so kaboom. Something like the attached might
> fix it.

Ouch. Yeah, that'd do it and matches the evidence. With this change,
I couldn't reproduce the problem after 90 minutes with a test case
that otherwise hits it within a couple of minutes.

Here's a patch with a commit message explaining the change.

It also removes an obsolete comment, which is in fact related. The
comment refers to an output parameter internal_pages_used, which must
have been used to report this exact phenomenon in an earlier
development version. But there is no such parameter in the committed
version, and instead there is the soft flag to prevent internal
allocation. I have no view on which approach is best, but yeah, if
we're using a soft flag, it has to work reliably.

This brings us to a difficult choice: we're about to cut a new
release, and this could in theory be included. Even though the fix is
quite convincing, it doesn't seem wise to change such complicated code
at the last minute, and I know from an off-list chat that that is also
Robert's view. So I'll wait until after the release, and we'll have
to live with the bug for another 3 months.

Note that this patch addresses the error "dsa_allocate could not find
%zu free pages". (The error "dsa_area could not attach to segment" is
something else and apparently rarer.)

> Boy, I love FreePageManagerDump!

Yeah. And I love reproducible bugs.

--
Thomas Munro
http://www.enterprisedb.com

Attachment Content-Type Size
0001-Fix-freepage.c-bug-that-causes-rare-dsa_allocate-fai.patch application/octet-stream 2.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-02-10 23:33:53 Re: dsa_allocate() faliure
Previous Message Tom Lane 2019-02-10 22:39:19 Re: BUG #15572: Misleading message reported by "Drop function operation" on DB with functions having same name

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2019-02-10 23:33:53 Re: dsa_allocate() faliure
Previous Message Tumasgiu Rossini 2019-02-10 18:43:55 Re: