Re: error context for vacuum to include block number

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: error context for vacuum to include block number
Date: 2020-02-17 03:57:31
Message-ID: 20200217035731.GL31889@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 17, 2020 at 10:47:47AM +0900, Masahiko Sawada wrote:
> Thank you for updating the patch!
>
> 1.
> The above lines need a new line.

Done, thanks.

> 2.
> In lazy_vacuum_heap, we set the error context and then call
> pg_rusage_init whereas lazy_vacuum_index and lazy_cleanup_index does
> the opposite. And lazy_scan_heap also call pg_rusage first. I think
> lazy_vacuum_heap should follow them for consistency. That is, we can
> set error context after pages = 0.

Right. Maybe I did it the other way because the two uses of
PROGRESS_VACUUM_PHASE_VACUUM_HEAP were right next to each other.

> 3.
> We have 2 other phases: PROGRESS_VACUUM_PHASE_TRUNCATE and
> PROGRESS_VACUUM_PHASE_FINAL_CLEANUP. I think it's better to set the
> error context in lazy_truncate_heap as well. What do you think?
>
> I'm not sure it's worth to set the error context for FINAL_CLENAUP but
> we should add the case of FINAL_CLENAUP to vacuum_error_callback as
> no-op or explain it as a comment even if we don't.

I don't have strong feelings either way.

I looked a bit at the truncation phase. It also truncates the FSM and VM
forks, which could be misleading if the error was in one of those files and not
the main filenode.

I'd have to find a way to test it...
...and was pleasantly surprised to see that earlier phases don't choke if I
(re)create a fake 4GB table like:

postgres=# CREATE TABLE trunc(i int);
CREATE TABLE
postgres=# \x\t
Expanded display is on.
Tuples only is on.
postgres=# SELECT relfilenode FROM pg_class WHERE oid='trunc'::regclass;
relfilenode | 59068

postgres=# \! bash -xc 'truncate -s 1G ./pgsql13.dat111/base/12689/59068{,.{1..3}}'
+ truncate -s 1G ./pgsql13.dat111/base/12689/59074 ./pgsql13.dat111/base/12689/59074.1 ./pgsql13.dat111/base/12689/59074.2 ./pgsql13.dat111/base/12689/59074.3

postgres=# \timing
Timing is on.
postgres=# SET client_min_messages=debug; SET statement_timeout='13s'; VACUUM VERBOSE trunc;
INFO: vacuuming "public.trunc"
INFO: "trunc": found 0 removable, 0 nonremovable row versions in 524288 out of 524288 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 2098
There were 0 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
524288 pages are entirely empty.
CPU: user: 5.00 s, system: 1.50 s, elapsed: 6.52 s.
ERROR: canceling statement due to statement timeout
CONTEXT: while truncating relation "public.trunc" to 0 blocks

The callback surrounding RelationTruncate() seems hard to hit unless you add
CHECK_FOR_INTERRUPTS(); the same was true for index cleanup.

The truncation uses a prefetch, which is more likely to hit any lowlevel error,
so I added callback there, too.

BTW, for the index cases, I didn't like repeating the namespace here, but WDYT ?
|CONTEXT: while vacuuming index "public.t_i_idx3" of relation "t"

Thanks for rerere-reviewing.

--
Justin

Attachment Content-Type Size
v19-0001-vacuum-errcontext-to-show-block-being-processed.patch text/x-diff 8.8 KB
v19-0002-add-callback-for-truncation.patch text/x-diff 3.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2020-02-17 04:07:00 Re: small improvement of the elapsed time for truncating heap in vacuum
Previous Message Kasahara Tatsuhito 2020-02-17 03:43:41 Re: small improvement of the elapsed time for truncating heap in vacuum