Re: error context for vacuum to include block number

From: Andres Freund <andres(at)anarazel(dot)de>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: error context for vacuum to include block number
Date: 2019-12-11 16:54:25
Message-ID: 20191211165425.4ewww2s5k5cafi4l@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Thanks for working on this!

On 2019-12-11 08:36:48 -0600, Justin Pryzby wrote:
> On Wed, Dec 11, 2019 at 09:15:07PM +0900, Michael Paquier wrote:
> > On Fri, Dec 06, 2019 at 10:23:25AM -0600, Justin Pryzby wrote:
> > > Find attached updated patch:
> > > . Use structure to include relation name.
> > > . Split into a separate patch rename of "StringInfoData buf".
> > >
> > > 2019-11-27 20:04:53.640 CST [14244] ERROR: canceling statement due to statement timeout
> > > 2019-11-27 20:04:53.640 CST [14244] CONTEXT: block 2314 of relation t
> > > 2019-11-27 20:04:53.640 CST [14244] STATEMENT: vacuum t;
> > >
> > > I tried to use BufferGetTag() to avoid using a 2ndary structure, but fails if
> > > the buffer is not pinned.

The tag will not add all that informative details, because the
relfilenode isn't easily mappable to the table name or such.

> diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
> index 043ebb4..9376989 100644
> --- a/src/backend/access/heap/vacuumlazy.c
> +++ b/src/backend/access/heap/vacuumlazy.c
> @@ -138,6 +138,12 @@ typedef struct LVRelStats
> bool lock_waiter_detected;
> } LVRelStats;
>
> +typedef struct
> +{
> + char *relname;
> + char *relnamespace;
> + BlockNumber blkno;
> +} vacuum_error_callback_arg;

Hm, wonder if could be worthwhile to not use a separate struct here, but
instead extend one of the existing structs to contain the necessary
information. Or perhaps have one new struct with all the necessary
information. There's already quite a few places that do
get_namespace_name(), for example.

> @@ -524,6 +531,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
> PROGRESS_VACUUM_MAX_DEAD_TUPLES
> };
> int64 initprog_val[3];
> + ErrorContextCallback errcallback;
> + vacuum_error_callback_arg cbarg;

Not a fan of "cbarg", too generic.

> pg_rusage_init(&ru0);
>
> @@ -635,6 +644,15 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
> else
> skipping_blocks = false;
>
> + /* Setup error traceback support for ereport() */
> + errcallback.callback = vacuum_error_callback;
> + cbarg.relname = relname;
> + cbarg.relnamespace = get_namespace_name(RelationGetNamespace(onerel));
> + cbarg.blkno = 0; /* Not known yet */
> + errcallback.arg = (void *) &cbarg;
> + errcallback.previous = error_context_stack;
> + error_context_stack = &errcallback;
> +
> for (blkno = 0; blkno < nblocks; blkno++)
> {
> Buffer buf;
> @@ -658,6 +676,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
>
> pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
>
> + cbarg.blkno = blkno;
> +
> if (blkno == next_unskippable_block)
> {
> /* Time to advance next_unskippable_block */
> @@ -817,7 +837,6 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
>
> buf = ReadBufferExtended(onerel, MAIN_FORKNUM, blkno,
> RBM_NORMAL, vac_strategy);
> -
> /* We need buffer cleanup lock so that we can prune HOT chains. */
> if (!ConditionalLockBufferForCleanup(buf))
> {
> @@ -1388,6 +1407,9 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
> RecordPageWithFreeSpace(onerel, blkno, freespace);
> }
>
> + /* Pop the error context stack */
> + error_context_stack = errcallback.previous;
> +
> /* report that everything is scanned and vacuumed */
> pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
>
> @@ -2354,3 +2376,15 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
>
> return all_visible;
> }

I think this will misattribute errors that happen when in the:
/*
* If we are close to overrunning the available space for dead-tuple
* TIDs, pause and do a cycle of vacuuming before we tackle this page.
*/
section of lazy_scan_heap(). That will

a) scan the index, during which we presumably don't want the same error
context, as it'd be quite misleading: The block that was just scanned
in the loop isn't actually likely to be the culprit for an index
problem. And we'd not mention the fact that the problem is occurring
in the index.
b) will report the wrong block, when in lazy_vacuum_heap().

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-12-11 17:11:03 Re: logical decoding bug: segfault in ReorderBufferToastReplace()
Previous Message Tom Lane 2019-12-11 16:43:23 Re: BUG #16059: Tab-completion of filenames in COPY commands removes required quotes