Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
Date: 2021-06-08 11:03:29
Message-ID: 20210608110329.GB16435@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jun 06, 2021 at 11:00:38AM -0700, Peter Geoghegan wrote:
> On Sun, Jun 6, 2021 at 9:35 AM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> > I'll leave the instance running for a little bit before restarting (or kill-9)
> > in case someone requests more info.
>
> How about dumping the page image out, and sharing it with the list?
> This procedure should work fine from gdb:
>
> https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD#Dumping_a_page_image_from_within_GDB

> I suggest that you dump the "page" pointer inside lazy_scan_prune(). I
> imagine that you have the instance already stuck in an infinite loop,
> so what we'll probably see from the page image is the page after the
> first prune and another no-progress prune.

The cluster was again rejecting with "too many clients already".

I was able to open a shell this time, but it immediately froze when I tried to
tab complete "pg_stat_acti"...

I was able to dump the page image, though - attached. I can send you its
"data" privately, if desirable. I'll also try to step through this.

postgres=# SELECT lp, lp_off, lp_flags, lp_len, t_xmin, t_xmax, t_field3, t_ctid, t_infomask2, t_infomask, t_hoff, t_bits, t_oid FROM heap_page_items(pg_read_binary_file('/tmp/dump_block.page'));
lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid
----+--------+----------+--------+-----------+-----------+----------+--------+-------------+------------+--------+----------------------------------+-------
1 | 1320 | 1 | 259 | 926025112 | 0 | 0 | (1,1) | 32799 | 10499 | 32 | 11111111111111111111111000100000 |
2 | 8088 | 1 | 104 | 926018702 | 0 | 0 | (1,2) | 32799 | 10497 | 32 | 11111111111111111111100000000000 |
3 | 0 | 0 | 0 | | | | | | | | |
4 | 7904 | 1 | 179 | 926018702 | 0 | 0 | (1,4) | 32799 | 10499 | 32 | 11111111111111111111111000100000 |
5 | 7728 | 1 | 176 | 926018702 | 0 | 0 | (1,5) | 32799 | 10499 | 32 | 11111111111111111111111000100000 |
6 | 7464 | 1 | 259 | 926014884 | 926025112 | 0 | (1,1) | 49183 | 9475 | 32 | 11111111111111111111111000100000 |
7 | 2 | 2 | 0 | | | | | | | | |
8 | 4 | 2 | 0 | | | | | | | | |
9 | 19 | 2 | 0 | | | | | | | | |
10 | 0 | 0 | 0 | | | | | | | | |
11 | 20 | 2 | 0 | | | | | | | | |
12 | 5792 | 1 | 1666 | 926014887 | 0 | 0 | (1,12) | 31 | 10499 | 32 | 11111111111111111111101000100000 |
13 | 5 | 2 | 0 | | | | | | | | |
14 | 3912 | 1 | 1880 | 925994211 | 0 | 0 | (1,14) | 31 | 10499 | 32 | 11111111111111111111110100110000 |
15 | 0 | 3 | 0 | | | | | | | | |
16 | 18 | 2 | 0 | | | | | | | | |
17 | 0 | 3 | 0 | | | | | | | | |
18 | 1936 | 1 | 1976 | 926013259 | 0 | 0 | (1,18) | 32799 | 10499 | 32 | 11111111111111111111110100110000 |
19 | 1760 | 1 | 176 | 926014887 | 0 | 0 | (1,19) | 32799 | 10499 | 32 | 11111111111111111111111000100000 |
20 | 1584 | 1 | 176 | 926014889 | 0 | 0 | (1,20) | 32799 | 10499 | 32 | 11111111111111111111111000100000 |
21 | 6 | 2 | 0 | | | | | | | | |
22 | 0 | 3 | 0 | | | | | | | | |
23 | 0 | 3 | 0 | | | | | | | | |
24 | 0 | 3 | 0 | | | | | | | | |
25 | 0 | 3 | 0 | | | | | | | | |
26 | 0 | 3 | 0 | | | | | | | | |
27 | 0 | 3 | 0 | | | | | | | | |
28 | 0 | 3 | 0 | | | | | | | | |
(28 rows)

No great surprise that it's again in pg_statistic.

#0 GetPrivateRefCountEntry (buffer=buffer(at)entry=411, do_move=do_move(at)entry=false) at bufmgr.c:313
#1 0x00000000007ecb4f in GetPrivateRefCount (buffer=411) at bufmgr.c:398
#2 BufferGetBlockNumber (buffer=buffer(at)entry=411) at bufmgr.c:2762
#3 0x00000000004fa0f3 in heap_prune_chain (prstate=0x7fff7e4a9180, rootoffnum=7, buffer=411) at pruneheap.c:625
#4 heap_page_prune (relation=relation(at)entry=0x7fe636faed28, buffer=buffer(at)entry=411, vistest=vistest(at)entry=0xe7bcc0 <GlobalVisCatalogRels>, old_snap_xmin=old_snap_xmin(at)entry=0, old_snap_ts=old_snap_ts(at)entry=0,
report_stats=report_stats(at)entry=false, off_loc=<optimized out>, off_loc(at)entry=0x12b433c) at pruneheap.c:278
#5 0x00000000004fd9bf in lazy_scan_prune (vacrel=vacrel(at)entry=0x12b42d0, buf=buf(at)entry=411, blkno=blkno(at)entry=1, page=page(at)entry=0x2aaaab54be00 "J\f", vistest=vistest(at)entry=0xe7bcc0 <GlobalVisCatalogRels>,
prunestate=prunestate(at)entry=0x7fff7e4aae40) at vacuumlazy.c:1712
#6 0x0000000000500263 in lazy_scan_heap (aggressive=<optimized out>, params=0x12ce89c, vacrel=<optimized out>) at vacuumlazy.c:1347
#7 heap_vacuum_rel (rel=0x7fe636faed28, params=0x12ce89c, bstrategy=<optimized out>) at vacuumlazy.c:612
#8 0x000000000067418a in table_relation_vacuum (bstrategy=<optimized out>, params=0x12ce89c, rel=0x7fe636faed28) at ../../../src/include/access/tableam.h:1678
#9 vacuum_rel (relid=2619, relation=<optimized out>, params=params(at)entry=0x12ce89c) at vacuum.c:2001

--
Justin

Attachment Content-Type Size
dump_block.page.gz application/gzip 4.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Borodin 2021-06-08 11:07:32 Re: GiST operator class for bool
Previous Message Emre Hasegeli 2021-06-08 10:48:10 GiST operator class for bool