From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: _bt_split(), and the risk of OOM before its critical section |
Date: | 2019-05-06 23:11:55 |
Message-ID: | CAH2-Wz=r2ae=mLvD-o5hjzcbARBXJSXDZcL_v1J=FCzCb0Vqaw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, May 6, 2019 at 3:29 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Yeah, as _bt_split is currently coded, _bt_truncate has to be a "no
> errors" function, which it isn't. The pfree for its result is being
> done in an ill-chosen place, too.
I am tempted to move the call to _bt_truncate() out of _bt_split()
entirely on HEAD, possibly relocating it to
nbtsplitloc.c/_bt_findsplitloc(). That way, there is a clearer
separation between how split points are chosen, suffix truncation, and
the mechanical process of executing a legal page split.
> Another problem now that I look at it is that the _bt_getbuf for the right
> sibling is probably not too safe. And the _bt_vacuum_cycleid() call seems
> a bit dangerous from this standpoint as well.
Yeah, we can tighten those up without much difficulty.
> I'm not really concerned about that one because at that point the
> right page is still in a freshly-pageinit'd state. It's perhaps
> not quite as nice as having it be zeroes, but it won't look like
> it has any interesting data.
The important question is how VACUUM will recognize it. It's clearly
not as bad as something that causes "failed to re-find parent key"
errors, but I think that VACUUM might not be reclaiming it for the FSM
(haven't checked). Note that _bt_unlink_halfdead_page() is perfectly
happy to ignore the fact that the left sibling of a half-dead page has
a rightlink that doesn't point back to the target. Because, uh, there
might have been a concurrent page deletion, somehow.
We have heard a lot about "failed to re-find parent key" errors from
VACUUM before now because that is about the only strong cross-check
that it does. (Not that I'm arguing that we need more of that.)
> In any case, once we've started to fill the ropaque area, it would really
> be better if we don't call anything that could throw errors.
>
> Maybe we should bite the bullet and use two temp pages, so that none
> of the data ends up in the shared buffer arena until we reach the
> critical section? The extra copying is slightly annoying, but
> it certainly seems like enforcing this invariant over such a
> long stretch of code is not very maintainable.
While I think that the smarts we have around deciding a split point
will probably improve in future releases, and that we'll probably make
_bt_truncate() itself do more, the actual business of performing a
split has no reason to change that I can think of. I would like to
keep _bt_split() as simple as possible anyway -- it should only be
copying tuples using simple primitives like the bufpage.c routines.
Living with what we have now (not using a temp buffer for the right
page) seems fine.
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | Paul Jungwirth | 2019-05-06 23:21:45 | Re: range_agg |
Previous Message | Tom Lane | 2019-05-06 22:29:11 | Re: _bt_split(), and the risk of OOM before its critical section |