Re: dsa_allocate() faliure

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Jakub Glapa <jakub(dot)glapa(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Fabio Isabettini <fisabettini(at)voipfuture(dot)com>, Arne Roland <A(dot)Roland(at)index(dot)de>, Sand Stone <sand(dot)m(dot)stone(at)gmail(dot)com>, Rick Otten <rottenwindfish(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: dsa_allocate() faliure
Date: 2019-02-09 10:21:12
Message-ID: CA+TgmoY=VEAMFoeRtP4j-ZOKM-B=4j671j5GKb3gYQ94-PgjhA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Fri, Feb 8, 2019 at 8:00 AM Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> Sometimes FreeManagerPutInternal() returns a
> number-of-contiguous-pages-created-by-this-insertion that is too large
> by one. If this happens to be a new max-number-of-contiguous-pages,
> it causes trouble some arbitrary time later because the max is wrong
> and this FPM cannot satisfy a request that large, and it may not be
> recomputed for some time because the incorrect value prevents
> recomputation. Not sure yet if this is due to the lazy computation
> logic or a plain old fence-post error in the btree consolidation code
> or something else.

I spent a long time thinking about this and starting at code this
afternoon, but I didn't really come up with much of anything useful.
It seems like a strange failure mode, because
FreePageManagerPutInternal() normally just returns its third argument
unmodified. The only cases where anything else happens are the ones
where we're able to consolidate the returned span with a preceding or
following span, and I'm scratching my head as to how that logic could
be wrong, especially since it also has some Assert() statements that
seem like they would detect the kinds of inconsistencies that would
lead to trouble. For example, if we somehow ended up with two spans
that (improperly) overlapped, we'd trip an Assert(). And if that
didn't happen -- because we're not in an Assert-enabled build -- the
code is written so that it only relies on the npages value of the last
of the consolidated scans, so an error in the npages value of one of
the earlier spans would just get fixed up.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2019-02-09 10:51:55 Re: fast defaults in heap_getattr vs heap_deform_tuple
Previous Message Robert Haas 2019-02-09 09:10:52 Re: Why don't we have a small reserved OID range for patch revisions?

Browse pgsql-performance by date

  From Date Subject
Next Message Evandro Abreu 2019-02-09 16:45:50
Previous Message Justin Pryzby 2019-02-08 13:04:44 Re: Partitioning Optimizer Questions and Issues