Re: Why does [auto-]vacuum delay not report a wait event?

From: Mahendra Singh Thalor <mahi6run(at)gmail(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why does [auto-]vacuum delay not report a wait event?
Date: 2020-03-21 23:54:29
Message-ID: CAKYtNAqwZSaJGdXCL+1x_CFWx7-f=qJ5CMQiaVhRbpc_O00oLQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 21 Mar 2020 at 09:38, Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
>
> On Thu, Mar 19, 2020 at 03:44:49PM -0700, Andres Freund wrote:
> > But uh, unfortunately the vacuum delay code just sleeps without setting
> > a wait event:
> ...
> > Seems like it should instead use a new wait event in the PG_WAIT_TIMEOUT
> > class?
> >
> > Given how frequently we run into trouble with [auto]vacuum throttling
> > being a problem, and there not being any way to monitor that currently,
> > that seems like it'd be a significant improvement, given the effort?
>
> I see that pg_sleep sets WAIT_EVENT_PG_SLEEP, but pg_usleep doesn't, I
guess
> because the overhead is significant for a small number of usecs, and
because it
> doesn't work for pg_usleep to set a generic event if callers want to be
able to
> set a more specific wait event.
>
> Also, I noticed that SLEEP_ON_ASSERT comment (31338352b) wants to use
pg_usleep
> "which seems too short.". Surely it should use pg_sleep, added at
782eefc58 a
> few years later ?
>
> Also, there was a suggestion recently that this should have a separate
> vacuum_progress phase:
> |vacuumlazy.c:#define VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL 50 /* ms
*/
> |vacuumlazy.c:pg_usleep(VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL * 1000L);
>
> I was planning to look at that eventually ; should it have a wait event
instead
> or in addition ?
>
> > It'd probably also be helpful to report the total time [auto]vacuum
> > spent being delayed for vacuum verbose/autovacuum logging, but imo
> > that'd be a parallel feature to a wait event, not a replacement.
>
> This requires wider changes than I anticipated.
>
> 2020-03-20 22:35:51.308 CDT [16534] LOG: automatic aggressive vacuum of
table "template1.pg_catalog.pg_class": index scans: 1
> pages: 0 removed, 11 remain, 0 skipped due to pins, 0 skipped
frozen
> tuples: 6 removed, 405 remain, 0 are dead but not yet removable,
oldest xmin: 1574
> buffer usage: 76 hits, 7 misses, 8 dirtied
> avg read rate: 16.378 MB/s, avg write rate: 18.718 MB/s
> Cost-based delay: 2 msec
> system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
>
> VACUUM VERBOSE wouldn't normally be run with cost_delay > 0, so that
field will
> typically be zero, so I made it conditional, which is supposedly why it's
> written like that, even though that's not otherwise being used since
17eaae98.
>
> Added at https://commitfest.postgresql.org/28/2515/
>
> --
> Justin

Thanks Justin for quick patch.

I haven't reviewed your full patch but I can see that "make installcheck"
is failing with segment fault.

*Stack trace;*
Core was generated by `postgres: autovacuum worker regression
'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000560080cc827c in ginInsertCleanup (ginstate=0x7ffe0b648980,
full_clean=false, fill_fsm=true, forceCleanup=true, stats=0x0) at
ginfast.c:895
895 stats->delay_msec += vacuum_delay_point();
(gdb) bt
#0 0x0000560080cc827c in ginInsertCleanup (ginstate=0x7ffe0b648980,
full_clean=false, fill_fsm=true, forceCleanup=true, stats=0x0) at
ginfast.c:895
#1 0x0000560080cdd0c3 in ginvacuumcleanup (info=0x7ffe0b64b0c0, stats=0x0)
at ginvacuum.c:706
#2 0x0000560080d791d4 in index_vacuum_cleanup (info=0x7ffe0b64b0c0,
stats=0x0) at indexam.c:711
#3 0x0000560080fa790e in do_analyze_rel (onerel=0x56008259e6e0,
params=0x560082206de4, va_cols=0x0, acquirefunc=0x560080fa8a75
<acquire_sample_rows>, relpages=25, inh=false,
in_outer_xact=false, elevel=13) at analyze.c:683
#4 0x0000560080fa5f3e in analyze_rel (relid=37789,
relation=0x5600822ba1a0, params=0x560082206de4, va_cols=0x0,
in_outer_xact=false, bstrategy=0x5600822064e0) at analyze.c:263
#5 0x00005600810d9eb7 in vacuum (relations=0x56008227e5b8,
params=0x560082206de4, bstrategy=0x5600822064e0, isTopLevel=true) at
vacuum.c:468
#6 0x0000560081357608 in autovacuum_do_vac_analyze (tab=0x560082206de0,
bstrategy=0x5600822064e0) at autovacuum.c:3115
#7 0x00005600813557dd in do_autovacuum () at autovacuum.c:2466
#8 0x000056008135373d in AutoVacWorkerMain (argc=0, argv=0x0) at
autovacuum.c:1693
#9 0x0000560081352f75 in StartAutoVacWorker () at autovacuum.c:1487
#10 0x000056008137ed5f in StartAutovacuumWorker () at postmaster.c:5580
#11 0x000056008137e199 in sigusr1_handler (postgres_signal_arg=10) at
postmaster.c:5297
#12 <signal handler called>
#13 0x00007f18b778bff7 in __GI___select (nfds=9, readfds=0x7ffe0b64c050,
writefds=0x0, exceptfds=0x0, timeout=0x7ffe0b64bfc0) at
../sysdeps/unix/sysv/linux/select.c:41
#14 0x000056008137499a in ServerLoop () at postmaster.c:1691
#15 0x0000560081373e63 in PostmasterMain (argc=3, argv=0x560082189020) at
postmaster.c:1400
#16 0x00005600811d37ea in main (argc=3, argv=0x560082189020) at main.c:210

Here, stats is null so it is crashing.

--
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2020-03-22 00:08:19 Re: ssl passphrase callback
Previous Message Phillip Black 2020-03-21 23:48:03 Database recovery from tablespace only