Re: display offset along with block number in vacuum errors

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Mahendra Singh Thalor <mahi6run(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: display offset along with block number in vacuum errors
Date: 2020-08-20 05:01:22
Message-ID: CAA4eK1Kq-VMB6u5RWtK95EGV3uXgTrAx7O8j4vsK-VbKinZ6Dg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 19, 2020 at 12:54 PM Masahiko Sawada
<masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
>
> On Tue, 18 Aug 2020 at 13:06, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> >
> > I don't think we need to expose LVRelStats. We can just pass the
> > address of vacrelstats->offset_num to achieve what we want. I have
> > tried that and it works, see the
> > v6-0002-additinal-error-context-information-in-heap_page_ patch
> > attached. Do you see any problem with it?
>
> Yes, you're right. I'm concerned a bit the number of arguments passing
> to heap_page_prune() might get higher when we need other values to
> update for errcontext, but I'm okay with the current patch.
>

Yeah, we might need to think if we want to increase the number of
parameters but not sure we need to worry at this stage. If required, I
think we can either expose LVRelStats or extract a few parameters from
it and form a separate structure that could be passed to
heap_page_prune.

> Currently, we're in SCAN_HEAP phase in heap_page_prune() but should it
> be VACUUM_HEAP instead?
>

I think it is currently similar to what we do in progress reporting.
We set to VACUUM_HEAP phase where the progress reporting is also set
to *HEAP_BLKS_VACUUMED. Currently, heap_page_prune() is covered under
*HEAP_BLKS_SCANNED, so I don't see a pressing need to change the error
context phase for heap_page_prune(). And also, we need to add some
more smarts in heap_page_prune() for this which I want to avoid.

> Also, I've tested the patch with log_min_messages = 'info' and get the
> following sever logs:
>
..
>
> This is not directly related to the patch but it looks like we can
> improve the current errcontext settings. For instance, the message
> from lazy_vacuum_index(): there are two messages reporting the phases.
> I've attached the patch that improves the current errcontext setting,
> which can be applied before the patch adding offset number.
>

After your patch, I see output like below with log_min_messages=info,

2020-08-20 10:11:46.769 IST [2640] INFO: scanned index "idx_test_c1"
to remove 10000 row versions
2020-08-20 10:11:46.769 IST [2640] DETAIL: CPU: user: 0.06 s, system:
0.01 s, elapsed: 0.06 s
2020-08-20 10:11:46.769 IST [2640] CONTEXT: while vacuuming index
"idx_test_c1" of relation "public.test_vac"

2020-08-20 10:11:46.901 IST [2640] INFO: scanned index "idx_test_c2"
to remove 10000 row versions
2020-08-20 10:11:46.901 IST [2640] DETAIL: CPU: user: 0.10 s, system:
0.01 s, elapsed: 0.13 s
2020-08-20 10:11:46.901 IST [2640] CONTEXT: while vacuuming index
"idx_test_c2" of relation "public.test_vac"

2020-08-20 10:11:46.917 IST [2640] INFO: "test_vac": removed 10000
row versions in 667 pages
2020-08-20 10:11:46.917 IST [2640] DETAIL: CPU: user: 0.01 s, system:
0.00 s, elapsed: 0.01 s

2020-08-20 10:11:46.928 IST [2640] INFO: index "idx_test_c1" now
contains 50000 row versions in 276 pages
2020-08-20 10:11:46.928 IST [2640] DETAIL: 10000 index row versions
were removed.
136 index pages have been deleted, 109 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
2020-08-20 10:11:46.928 IST [2640] CONTEXT: while cleaning up index
"idx_test_c1" of relation "public.test_vac"

Here, we can notice that for the index, we are getting context
information but not for the heap. The reason is that in
vacuum_error_callback, we are not printing additional information for
phases VACUUM_ERRCB_PHASE_SCAN_HEAP and VACUUM_ERRCB_PHASE_VACUUM_HEAP
when block number is invalid. If we want to cover the 'info' messages
then won't it be better if we print a message in those phases even
block number is invalid (something like 'while scanning relation
\"%s.%s\"")

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-08-20 05:17:13 Re: "ccold" left by reindex concurrently are droppable?
Previous Message Justin Pryzby 2020-08-20 04:31:47 Re: Include access method in listTables output