Re: display offset along with block number in vacuum errors

From: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Mahendra Singh Thalor <mahi6run(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: display offset along with block number in vacuum errors
Date: 2020-08-19 07:23:51
Message-ID: CA+fd4k4HcbhPnCs7paRTw1K-AHin8y4xKomB9Ru0ATw0UeTy2w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 18 Aug 2020 at 13:06, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, Aug 17, 2020 at 11:38 AM Masahiko Sawada
> <masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
> >
> > On Sat, 15 Aug 2020 at 12:19, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > The reason why I have not included heap_page_prune related change in
> > > the patch is that I don't want to sprinkle this in every possible
> > > function (code path) called via vacuum especially if the probability
> > > of an error in that code path is low. But, I am fine if you and or
> > > others think that it is a good idea to update offset in
> > > heap_page_prune as well.
> >
> > I agree to not try sprinkling it many places than necessity.
> >
> > Regarding heap_page_prune(), I'm concerned a bit that
> > heap_page_prune() is typically the first function to check the tuple
> > visibility within the vacuum code. I've sometimes observed an error
> > with the message like "DETAIL: could not open file “pg_xact/00AB”: No
> > such file or directory" due to a tuple header corruption. I suspect
> > this message was emitted while checking tuple visibility in
> > heap_page_prune(). So I guess the likelihood of an error in that code
> > is not so low.
> >
>
> Fair point.
>
> > On the other hand, if we want to update the offset number in
> > heap_page_prune() we will need to expose some vacuum structs defined
> > as a static including LVRelStats.
> >
>
> I don't think we need to expose LVRelStats. We can just pass the
> address of vacrelstats->offset_num to achieve what we want. I have
> tried that and it works, see the
> v6-0002-additinal-error-context-information-in-heap_page_ patch
> attached. Do you see any problem with it?

Yes, you're right. I'm concerned a bit the number of arguments passing
to heap_page_prune() might get higher when we need other values to
update for errcontext, but I'm okay with the current patch.

Currently, we're in SCAN_HEAP phase in heap_page_prune() but should it
be VACUUM_HEAP instead?

Also, I've tested the patch with log_min_messages = 'info' and get the
following sever logs:

2020-08-19 14:28:09.917 JST [72912] INFO: launched 1 parallel vacuum
worker for index vacuuming (planned: 1)
2020-08-19 14:28:09.917 JST [72912] CONTEXT: while scanning block 973
of relation "public.tbl"
2020-08-19 14:28:09.959 JST [72912] INFO: scanned index "i1" to
remove 109872 row versions
2020-08-19 14:28:09.959 JST [72912] DETAIL: CPU: user: 0.04 s,
system: 0.00 s, elapsed: 0.04 s
2020-08-19 14:28:09.959 JST [72912] CONTEXT: while vacuuming index
"i1" of relation "public.tbl"
2020-08-19 14:28:09.967 JST [72936] INFO: scanned index "i2" to
remove 109872 row versions by parallel vacuum worker
2020-08-19 14:28:09.967 JST [72936] DETAIL: CPU: user: 0.03 s,
system: 0.00 s, elapsed: 0.04 s
2020-08-19 14:28:09.967 JST [72936] CONTEXT: while vacuuming index
"i2" of relation "public.tbl"
2020-08-19 14:28:09.967 JST [72912] INFO: scanned index "i2" to
remove 109872 row versions by parallel vacuum worker
2020-08-19 14:28:09.967 JST [72912] DETAIL: CPU: user: 0.03 s,
system: 0.00 s, elapsed: 0.04 s
2020-08-19 14:28:09.967 JST [72912] CONTEXT: while vacuuming index
"i2" of relation "public.tbl"
parallel worker
while scanning block 973 of relation "public.tbl"
2020-08-19 14:28:09.968 JST [72912] INFO: "tbl": removed 109872 row
versions in 487 pages
2020-08-19 14:28:09.968 JST [72912] DETAIL: CPU: user: 0.00 s,
system: 0.00 s, elapsed: 0.00 s
2020-08-19 14:28:09.968 JST [72912] CONTEXT: while vacuuming block
973 of relation "public.tbl"
2020-08-19 14:28:09.968 JST [72912] INFO: index "i1" now contains
110000 row versions in 578 pages
2020-08-19 14:28:09.968 JST [72912] DETAIL: 109872 index row versions
were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
2020-08-19 14:28:09.968 JST [72912] CONTEXT: while scanning block 973
of relation "public.tbl"
2020-08-19 14:28:09.968 JST [72912] INFO: index "i2" now contains
110000 row versions in 578 pages
2020-08-19 14:28:09.968 JST [72912] DETAIL: 109872 index row versions
were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
2020-08-19 14:28:09.968 JST [72912] CONTEXT: while scanning block 973
of relation "public.tbl"
2020-08-19 14:28:09.969 JST [72912] INFO: "tbl": found 110000
removable, 110000 nonremovable row versions in 974 out of 974 pages
2020-08-19 14:28:09.969 JST [72912] DETAIL: 0 dead row versions
cannot be removed yet, oldest xmin: 519
There were 372 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 0.05 s, system: 0.00 s, elapsed: 0.06 s.
2020-08-19 14:28:09.969 JST [72912] CONTEXT: while scanning block 973
of relation "public.tbl"

This is not directly related to the patch but it looks like we can
improve the current errcontext settings. For instance, the message
from lazy_vacuum_index(): there are two messages reporting the phases.
I've attached the patch that improves the current errcontext setting,
which can be applied before the patch adding offset number.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
vacuum_errcontext.patch application/x-patch 2.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2020-08-19 08:04:31 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Previous Message Julien Rouhaud 2020-08-19 07:21:33 Re: [PG13] Planning (time + buffers) data structure in explain plan (format text)