Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
Date: 2021-06-08 17:34:04
Message-ID: 20210608173404.GF16435@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 08, 2021 at 12:06:02PM -0400, Alvaro Herrera wrote:
> On 2021-Jun-06, Justin Pryzby wrote:
>
> > However, I also found an autovacuum chewing 100% CPU, and it appears the
> > problem is actually because autovacuum has locked a page of pg-statistic, and
> > every other process then gets stuck waiting in the planner. I checked a few
> > and found these:
>
> Hmm ... I wonder if this could be related to commits d9d076222f5b,
> c98763bf51bf, etc. I don't have any connecting thoughts other than the
> tuple visibility code being involved. Do you see any procs with the
> PROC_IN_SAFE_IC flag set?

Can you give me a hint how to do that from a corefile ?

I got this far:
(gdb) p MyProc->links
$13 = {prev = 0x0, next = 0x0}

I do have a reindex script which runs CIC, and it does looks suspicious.

<<Mon Jun 7 22:00:54 CDT 2021: starting db=ts
...
Mon Jun 7 22:01:16 CDT 2021: ts: pg_subscription_rel: pg_subscription_rel_srrelid_srsubid_index(reindex system)...
Mon Jun 7 22:01:16 CDT 2021: ts: pg_subscription: pg_subscription_oid_index(reindex system)...
Mon Jun 7 22:01:16 CDT 2021: ts: pg_subscription: pg_subscription_subname_index(reindex system)...
Mon Jun 7 22:01:16 CDT 2021: ts: pg_subscription: pg_toast.pg_toast_6100_index(reindex system)...
Mon Jun 7 22:01:17 CDT 2021: ts: pg_statistic_ext_data: pg_statistic_ext_data_stxoid_index(reindex system)...
Mon Jun 7 22:01:17 CDT 2021: ts: pg_statistic_ext_data: pg_toast.pg_toast_3429_index(reindex system)...
Mon Jun 7 22:01:17 CDT 2021: ts: pg_statistic_ext: pg_statistic_ext_name_index(reindex system)...
Mon Jun 7 22:01:17 CDT 2021: ts: pg_statistic_ext: pg_statistic_ext_oid_index(reindex system)...
Mon Jun 7 22:01:17 CDT 2021: ts: pg_statistic_ext: pg_statistic_ext_relid_index(reindex system)...
Mon Jun 7 22:01:17 CDT 2021: ts: pg_statistic_ext: pg_toast.pg_toast_3381_index(reindex system)...
Mon Jun 7 22:01:17 CDT 2021: ts: pg_statistic: pg_statistic_relid_att_inh_index(reindex system)...
Mon Jun 7 22:02:56 CDT 2021: ts: pg_statistic: pg_toast.pg_toast_2619_index(reindex system)...
Mon Jun 7 22:02:57 CDT 2021: ts: pg_statio_all_tables_snap: pg_statio_all_tables_snap_t_idx(reindex non-partitioned)...
ERROR: canceling statement due to statement timeout
reindex: warning, dropping invalid/unswapped index: pg_statio_all_tables_snap_t_idx_ccnew
Mon Jun 7 23:02:57 CDT 2021: ts: pg_statio_all_tables_snap: pg_toast.pg_toast_33011_index(reindex system)...
Mon Jun 7 23:02:57 CDT 2021: ts: pg_statio_all_indexes_snap: pg_statio_all_indexes_snap_t_idx(reindex non-partitioned)...
ERROR: canceling statement due to statement timeout
reindex: warning, dropping invalid/unswapped index: pg_statio_all_indexes_snap_t_idx_ccnew
Tue Jun 8 00:02:57 CDT 2021: ts: pg_statio_all_indexes_snap: pg_toast.pg_toast_33020_index(reindex system)...
Tue Jun 8 00:02:57 CDT 2021: ts: pg_shseclabel: pg_shseclabel_object_index(reindex system)...
Tue Jun 8 00:02:58 CDT 2021: ts: pg_shseclabel: pg_toast.pg_toast_3592_index(reindex system)...
Tue Jun 8 00:02:58 CDT 2021: ts: pg_shdescription: pg_shdescription_o_c_index(reindex system)...
Tue Jun 8 00:02:58 CDT 2021: ts: pg_shdescription: pg_toast.pg_toast_2396_index(reindex system)...
...
Tue Jun 8 00:02:57 CDT 2021: ts: pg_statio_all_indexes_snap: pg_toast.pg_toast_33020_index(reindex system)...
Tue Jun 8 00:02:57 CDT 2021: ts: pg_shseclabel: pg_shseclabel_object_index(reindex system)...
Tue Jun 8 00:02:58 CDT 2021: ts: pg_shseclabel: pg_toast.pg_toast_3592_index(reindex system)...
Tue Jun 8 00:02:58 CDT 2021: ts: pg_shdescription: pg_shdescription_o_c_index(reindex system)...
Tue Jun 8 00:02:58 CDT 2021: ts: pg_shdescription: pg_toast.pg_toast_2396_index(reindex system)...
...
Tue Jun 8 01:21:20 CDT 2021: ts: pg_aggregate: pg_aggregate_fnoid_index(reindex system)...
Tue Jun 8 01:21:20 CDT 2021: ts: pg_aggregate: pg_toast.pg_toast_2600_index(reindex system)...
Tue Jun 8 01:21:20 CDT 2021: ts: permissions: perm_group_idx(reindex non-partitioned)...
ERROR: canceling statement due to statement timeout
reindex: warning, dropping invalid/unswapped index: perm_group_idx_ccnew
Tue Jun 8 02:21:20 CDT 2021: ts: permissions: perm_user_idx(reindex non-partitioned)...
ERROR: canceling statement due to statement timeout
reindex: warning, dropping invalid/unswapped index: perm_user_idx_ccnew
Tue Jun 8 03:21:20 CDT 2021: ts: permissions: pg_toast.pg_toast_33577_index(reindex system)...
Tue Jun 8 03:21:20 CDT 2021: ts: patchfiles: patchfiles_filename_idx(reindex non-partitioned)...
ERROR: canceling statement due to statement timeout
reindex: warning, dropping invalid/unswapped index: patchfiles_filename_idx_ccnew
Tue Jun 8 04:21:21 CDT 2021: ts: patchfiles: patchfiles_pkey(reindex non-partitioned)...
ERROR: canceling statement due to statement timeout
reindex: warning, dropping invalid/unswapped index: patchfiles_pkey_ccnew

=> It's strange that these timed out, since the statio tables are less than
15MB, and permissions and patchfiles are both under 100kB.

And it seems to say that it time out after less than 1sec.

They're running this:
| PGOPTIONS="--deadlock_timeout=333ms -cstatement-timeout=3600s" psql -c "REINDEX INDEX CONCURRENTLY $i"
And if it times out, it then runs: $PSQL "DROP INDEX CONCURRENTLY $bad"

I've killed it a little bit ago, but I should've saved the start time of the
autovacuum. I found this:

#5 heap_vacuum_rel (rel=0x7fe636faed28, params=0x12ce89c, bstrategy=<optimized out>) at vacuumlazy.c:612
starttime = 676436464463888
ru0 = {tv = {tv_sec = 1623121264, tv_usec = 463887}, ru = {ru_utime = {tv_sec = 0, tv_usec = 77418}, ru_stime = {tv_sec = 0, tv_usec = 13866}, {ru_maxrss = 7440, __ru_maxrss_word = 7440}, {ru_ixrss = 0, __ru_ixrss_word = 0}, {
ru_idrss = 0, __ru_idrss_word = 0}, {ru_isrss = 0, __ru_isrss_word = 0}, {ru_minflt = 1826, __ru_minflt_word = 1826}, {ru_majflt = 1, __ru_majflt_word = 1}, {ru_nswap = 0, __ru_nswap_word = 0}, {ru_inblock = 2008,
__ru_inblock_word = 2008}, {ru_oublock = 192, __ru_oublock_word = 192}, {ru_msgsnd = 0, __ru_msgsnd_word = 0}, {ru_msgrcv = 0, __ru_msgrcv_word = 0}, {ru_nsignals = 0, __ru_nsignals_word = 0}, {ru_nvcsw = 29,
__ru_nvcsw_word = 29}, {ru_nivcsw = 9, __ru_nivcsw_word = 9}}}

$ date -d @1623121264
Mon Jun 7 22:01:04 CDT 2021

$ date -d '2000-01-01 UTC + 676436464seconds'
Mon Jun 7 22:01:04 CDT 2021

--
Justin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2021-06-08 17:45:58 Re: automatically generating node support functions
Previous Message Andres Freund 2021-06-08 17:17:11 Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic