Re: error context for vacuum to include block number

From: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: error context for vacuum to include block number
Date: 2020-02-17 05:18:11
Message-ID: CA+fd4k7mabf_yD=YqCNPc5RKJPsgVP6e2zVx7gBp62Be7UGFQA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 17 Feb 2020 at 12:57, Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
>
> On Mon, Feb 17, 2020 at 10:47:47AM +0900, Masahiko Sawada wrote:
> > Thank you for updating the patch!
> >

Thank you for updating the patch.

> > 1.
> > The above lines need a new line.
>
> Done, thanks.
>
> > 2.
> > In lazy_vacuum_heap, we set the error context and then call
> > pg_rusage_init whereas lazy_vacuum_index and lazy_cleanup_index does
> > the opposite. And lazy_scan_heap also call pg_rusage first. I think
> > lazy_vacuum_heap should follow them for consistency. That is, we can
> > set error context after pages = 0.
>
> Right. Maybe I did it the other way because the two uses of
> PROGRESS_VACUUM_PHASE_VACUUM_HEAP were right next to each other.
>
> > 3.
> > We have 2 other phases: PROGRESS_VACUUM_PHASE_TRUNCATE and
> > PROGRESS_VACUUM_PHASE_FINAL_CLEANUP. I think it's better to set the
> > error context in lazy_truncate_heap as well. What do you think?
> >
> > I'm not sure it's worth to set the error context for FINAL_CLENAUP but
> > we should add the case of FINAL_CLENAUP to vacuum_error_callback as
> > no-op or explain it as a comment even if we don't.
>
> I don't have strong feelings either way.
>
> I looked a bit at the truncation phase. It also truncates the FSM and VM
> forks, which could be misleading if the error was in one of those files and not
> the main filenode.
>
> I'd have to find a way to test it...
> ...and was pleasantly surprised to see that earlier phases don't choke if I
> (re)create a fake 4GB table like:
>
> postgres=# CREATE TABLE trunc(i int);
> CREATE TABLE
> postgres=# \x\t
> Expanded display is on.
> Tuples only is on.
> postgres=# SELECT relfilenode FROM pg_class WHERE oid='trunc'::regclass;
> relfilenode | 59068
>
> postgres=# \! bash -xc 'truncate -s 1G ./pgsql13.dat111/base/12689/59068{,.{1..3}}'
> + truncate -s 1G ./pgsql13.dat111/base/12689/59074 ./pgsql13.dat111/base/12689/59074.1 ./pgsql13.dat111/base/12689/59074.2 ./pgsql13.dat111/base/12689/59074.3
>
> postgres=# \timing
> Timing is on.
> postgres=# SET client_min_messages=debug; SET statement_timeout='13s'; VACUUM VERBOSE trunc;
> INFO: vacuuming "public.trunc"
> INFO: "trunc": found 0 removable, 0 nonremovable row versions in 524288 out of 524288 pages
> DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 2098
> There were 0 unused item identifiers.
> Skipped 0 pages due to buffer pins, 0 frozen pages.
> 524288 pages are entirely empty.
> CPU: user: 5.00 s, system: 1.50 s, elapsed: 6.52 s.
> ERROR: canceling statement due to statement timeout
> CONTEXT: while truncating relation "public.trunc" to 0 blocks
>

Yeah lazy_scan_heap deals with all dummy files as new empty pages.

> The callback surrounding RelationTruncate() seems hard to hit unless you add
> CHECK_FOR_INTERRUPTS(); the same was true for index cleanup.
>
> The truncation uses a prefetch, which is more likely to hit any lowlevel error,
> so I added callback there, too.
>
> BTW, for the index cases, I didn't like repeating the namespace here, but WDYT ?
> |CONTEXT: while vacuuming index "public.t_i_idx3" of relation "t"

The current message looks good to me because we cannot have a table
and its index in the different schema.

1.
pg_rusage_init(&ru0);
npages = 0;

/*
* Setup error traceback support for ereport()
*/
init_vacuum_error_callback(&errcallback, &errcbarg, onerel,
PROGRESS_VACUUM_PHASE_VACUUM_HEAP);
error_context_stack = &errcallback;

tupindex = 0;

Oops it seems to me that it's better to set error context after
tupindex = 0. Sorry for my bad.

And the above comment can be written in a single line as other same comments.

2.
@@ -2568,6 +2643,12 @@ count_nondeletable_pages(Relation onerel,
LVRelStats *vacrelstats)
BlockNumber blkno;
BlockNumber prefetchedUntil;
instr_time starttime;
+ ErrorContextCallback errcallback;
+ vacuum_error_callback_arg errcbarg;
+
+ /* Setup error traceback support for ereport() */
+ init_vacuum_error_callback(&errcallback, &errcbarg, onerel,
+ PROGRESS_VACUUM_PHASE_TRUNCATE);

Hmm I don't think it's a good idea to have count_nondeletable_pages
set the error context of PHASE_TRUNCATE. Because the patch sets the
error context during RelationTruncate that actually truncates the heap
but count_nondeletable_pages doesn't truncate it. I think setting the
error context only during RelationTruncate would be a good start. We
can hear other opinions from other hackers. Some hackers may want to
set the error context for whole lazy_truncate_heap.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kasahara Tatsuhito 2020-02-17 05:28:34 Re: small improvement of the elapsed time for truncating heap in vacuum
Previous Message Thomas Munro 2020-02-17 04:57:43 Re: Collation versioning