Re: hung backends stuck in spinlock heavy endless loop

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: hung backends stuck in spinlock heavy endless loop
Date: 2015-01-15 00:36:39
Message-ID: CAHyXU0yA4K82XGW5FZ6N5T8f-RfCixRQaUVCGMbUVKX6zf8ORg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 14, 2015 at 6:26 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
> On Wed, Jan 14, 2015 at 5:39 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
>> On Wed, Jan 14, 2015 at 3:38 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>>> (gdb) print BufferGetBlockNumber(buf)
>>> $15 = 9
>>>
>>> ..and it stays 9, continuing several times having set breakpoint.
>>
>>
>> And the index involved? I'm pretty sure that this in an internal page, no?
>
> The index is the oid index on pg_class. Some more info:
>
> *) temp table churn is fairly high. Several dozen get spawned and
> destroted at the start of a replication run, all at once, due to some
> dodgy coding via dblink. During the replication run, the temp table
> churn rate drops.
>
> *) running btreecheck, I see:
> cds2=# select bt_index_verify('pg_class_oid_index');
> NOTICE: page 7 of index "pg_class_oid_index" is deleted
> NOTICE: page 10 of index "pg_class_oid_index" is deleted
> NOTICE: page 12 of index "pg_class_oid_index" is deleted
> bt_index_verify
> ─────────────────
>
>
> cds2=# select bt_leftright_verify('pg_class_oid_index');
> WARNING: left link/right link pair don't comport at level 0, block 9,
> last: 2, current left: 4
> WARNING: left link/right link pair don't comport at level 0, block 9,
> last: 9, current left: 4
> WARNING: left link/right link pair don't comport at level 0, block 9,
> last: 9, current left: 4
> WARNING: left link/right link pair don't comport at level 0, block 9,
> last: 9, current left: 4
> WARNING: left link/right link pair don't comport at level 0, block 9,
> last: 9, current left: 4
> [repeat infinity until cancel]
>
> which looks like the index is corrupted? ISTM _bt_moveright is
> hanging because it's trying to move from block 9 to block 9 and so
> loops forever.

per Peter the following might be useful:

cds2=# select * from bt_metap('pg_class_oid_index');
magic │ version │ root │ level │ fastroot │ fastlevel
────────┼─────────┼──────┼───────┼──────────┼───────────
340322 │ 2 │ 3 │ 1 │ 3 │ 1

cds2=# select (bt_page_stats('pg_class_oid_index', s)).* from
generate_series(1,12) s;
blkno │ type │ live_items │ dead_items │ avg_item_size │ page_size │
free_size │ btpo_prev │ btpo_next │ btpo │ btpo_flags
───────┼──────┼────────────┼────────────┼───────────────┼───────────┼───────────┼───────────┼───────────┼───────┼────────────
1 │ l │ 119 │ 0 │ 16 │ 8192 │
5768 │ 0 │ 4 │ 0 │ 1
2 │ l │ 25 │ 0 │ 16 │ 8192 │
7648 │ 4 │ 9 │ 0 │ 1
3 │ r │ 8 │ 0 │ 15 │ 8192 │
7996 │ 0 │ 0 │ 1 │ 2
4 │ l │ 178 │ 0 │ 16 │ 8192 │
4588 │ 1 │ 2 │ 0 │ 1
5 │ l │ 7 │ 0 │ 16 │ 8192 │
8008 │ 9 │ 11 │ 0 │ 1
6 │ l │ 5 │ 0 │ 16 │ 8192 │
8048 │ 11 │ 8 │ 0 │ 1
7 │ d │ 0 │ 0 │ 0 │ 8192 │
0 │ -1 │ -1 │ 12366 │ 0
8 │ l │ 187 │ 0 │ 16 │ 8192 │
4408 │ 6 │ 0 │ 0 │ 1
9 │ l │ 25 │ 0 │ 16 │ 8192 │
7648 │ 4 │ 9 │ 0 │ 1
10 │ d │ 0 │ 0 │ 0 │ 8192 │
0 │ -1 │ -1 │ 12366 │ 0
11 │ l │ 6 │ 0 │ 16 │ 8192 │
8028 │ 5 │ 6 │ 0 │ 1
12 │ d │ 0 │ 0 │ 0 │ 8192 │
0 │ -1 │ -1 │ 10731 │ 0

merlin

Attachment Content-Type Size
page_items.log text/x-log 50.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2015-01-15 00:39:27 Re: Shouldn't CREATE TABLE LIKE copy the relhasoids property?
Previous Message Peter Geoghegan 2015-01-15 00:34:52 Re: hung backends stuck in spinlock heavy endless loop