Re: BTP_DELETED leaf still in tree

From: Daniel Wood <hexexpert(at)comcast(dot)net>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BTP_DELETED leaf still in tree
Date: 2019-10-10 23:44:46
Message-ID: 1430933023.970983.1570751086396@connect.xfinity.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On October 10, 2019 at 1:18 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
>
> On Thu, Oct 10, 2019 at 12:48 PM Daniel Wood <hexexpert(at)comcast(dot)net> wrote:
> > Update query stuck in a loop. Looping in _bt_moveright().
>
> You didn't say which PostgreSQL versions were involved, and if the
> database was ever upgraded using pg_upgrade. Those details could
> matter.

PG_VERSION says 10. I suspect we are running 10.9. I have no idea if pg_upgrade was ever done.

> > ExecInsertIndexTuples->btinsert->_bt_doinsert->_bt_search->_bt_moveright
> >
> > Mid Tree Node downlink path taken by _bt_search points to a BTP_DELETED Leaf.
>
> This should hardly ever happen -- it is barely possible for an index
> scan to land on a BTP_DELETED leaf page (or a half-dead page) when
> following a downlink in its parent. Recall that nbtree uses Lehman &
> Yao's design, so _bt_search() does not "couple" buffer locks on the
> way down. It would probably be impossible to observe this happening
> without carefully setting breakpoints in multiple sessions.
>
> If this happens reliably for you, which it sounds like, then you can
> already assume that the index is corrupt.
>
> > btpo_next is also DELETED but not in the tree.
> >
> > btpo_next->btpo_next is NOT deleted but in the mid tree as a lesser key value.
> >
> > Thus creating an endless loop in moveright.
>
> Offhand, these other details sound normal. The side links are still
> needed in fully deleted (BTP_DELETED) pages. And, moving right and
> finding lesser key values (not greater key values) is normal with
> deleted pages, since page deletion makes the keyspace move right, not
> left (moving the keyspace left is how the source Lanin & Shasha paper
> does it, though).
>
> Actually, I take it back -- the looping part is not normal. The
> btpo_next->btpo_next page has no business linking back to the
> original/first deleted page you mentioned. That's just odd.

btpo_next->btpo_next does NOT link directly back to the 1st deleted page. It simply links to some in-use page which is 50 or so leaf pages back in the tree. Eventually we do reach the two deleted pages again. Only the first one is in the 'tree'.

> Can you provide me with a dump of the page images? The easiest way of
> getting a page dump is described here:

Customer data. Looks like meaningless customer data (5 digit key values). But too much paperwork. :-)

The hard part for me to understand isn't just why the DELETED leaf node is still referenced in the mid tree node.
It is that the step which sets BTP_DELETED should have also linked its leaf and right siblings together. But this hasn't been done.

Could the page have already have been dirty, but because of "target != leafblkno", we didn't stamp a new LSN on it. Could this allow us to write the DELETED dirty page without the XLOG_BTREE_MARK_PAGE_HALFDEAD and XLOG_BTREE_UNLINK_PAGE being flushed? Of course, I don't understand the "target != leafblkno".

In any case, thanks.

> https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD#contrib.2Fpageinspect_page_dump
>
> If I had to guess, I'd guess that this was due to a generic storage problem.
>
> --
> Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2019-10-11 00:22:07 Re: BRIN index which is much faster never chosen by planner
Previous Message Tomas Vondra 2019-10-10 23:13:09 Re: BRIN index which is much faster never chosen by planner