Re: pg_amcheck contrib application

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Geoghegan <pg(at)bowt(dot)ie>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, Stephen Frost <sfrost(at)snowman(dot)net>, Michael Paquier <michael(at)paquier(dot)xyz>, Amul Sul <sulamul(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_amcheck contrib application
Date: 2021-03-04 15:29:12
Message-ID: CA+TgmoZgdsVo9ji_Bc7GVvEqGMapC1Hqyuy6CLV-3oyE7HapRA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Most of these changes sound good. I'll go through the whole patch
again today, or as much of it as I can. But before I do that, I want
to comment on this point specifically.

On Thu, Mar 4, 2021 at 1:25 AM Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com> wrote:
> I think this is fixed up now. There is an interaction with amcheck's verify_heapam(), where that function raises an error if the startblock or endblock arguments are out of bounds for the relation in question. Rather than aborting the entire pg_amcheck run, it avoids passing inappropriate block ranges to verify_heapam() and outputs a warning, so:
>
> % pg_amcheck mark.dilger -t foo -t pg_class --progress -v --startblock=35 --endblock=77
> pg_amcheck: in database "mark.dilger": using amcheck version "1.3" in schema "public"
> 0/6 relations (0%) 0/55 pages (0%)
> pg_amcheck: checking table "mark.dilger"."public"."foo" (oid 16385) (10/45 pages)
> pg_amcheck: warning: ignoring endblock option 77 beyond end of table "mark.dilger"."public"."foo"
> pg_amcheck: checking btree index "mark.dilger"."public"."foo_idx" (oid 16388) (30/30 pages)
> pg_amcheck: checking table "mark.dilger"."pg_catalog"."pg_class" (oid 1259) (0/13 pages)
> pg_amcheck: warning: ignoring startblock option 35 beyond end of table "mark.dilger"."pg_catalog"."pg_class"
> pg_amcheck: warning: ignoring endblock option 77 beyond end of table "mark.dilger"."pg_catalog"."pg_class"
> pg_amcheck: checking btree index "mark.dilger"."pg_catalog"."pg_class_relname_nsp_index" (oid 2663) (6/6 pages)
> pg_amcheck: checking btree index "mark.dilger"."pg_catalog"."pg_class_tblspc_relfilenode_index" (oid 3455) (5/5 pages)
> pg_amcheck: checking btree index "mark.dilger"."pg_catalog"."pg_class_oid_index" (oid 2662) (4/4 pages)
> 6/6 relations (100%) 55/55 pages (100%)
>
> The way the (x/y pages) is printed takes into account that the [startblock..endblock] range may reduce the number of pages to check (x) to something less than the number of pages in the relation (y), but the reporting is a bit of a lie when the startblock is beyond the end of the table, as it doesn't get passed to verify_heapam and so the number of blocks checked may be more than the zero blocks reported. I think I might need to fix this up tomorrow, but I want to get what I have in this patch set posted tonight, so it's not fixed here. Also, there are multiple ways of addressing this, and I'm having trouble deciding which way is best. I can exclude the relation from being checked at all, or realize earlier that I'm not going to honor the startblock argument and compute the blocks to check correctly. Thoughts?

I think this whole approach is pretty suspect because the number of
blocks in the relation can increase (by relation extension) or
decrease (by VACUUM or TRUNCATE) between the time when we query for
the list of target relations and the time we get around to executing
any queries against them. I think it's OK to use the number of
relation pages for progress reporting because progress reporting is
only approximate anyway, but I wouldn't print them out in the progress
messages, and I wouldn't try to fix up the startblock and endblock
arguments on the basis of how long you think that relation is going to
be. You seem to view the fact that the server reported the error as
the reason for the problem, but I don't agree. I think having the
server report the error here is right, and the problem is that the
error reporting sucked because it was long-winded and didn't
necessarily tell you which table had the problem.

There are a LOT of things that can go wrong when we go try to run
verify_heapam on a table. The table might have been dropped; in fact,
on a busy production system, such cases are likely to occur routinely
if DDL is common, which for many users it is. The system catalog
entries might be screwed up, so that the relation can't be opened.
There might be an unreadable page in the relation, either because the
OS reports an I/O error or something like that, or because checksum
verification fails. There are various other possibilities. We
shouldn't view such errors as low-level things that occur only in
fringe cases; this is a corruption-checking tool, and we should expect
that running it against messed-up databases will be common. We
shouldn't try to interpret the errors we get or make any big decisions
about them, but we should have a clear way of reporting them so that
the user can decide what to do.

Just as an experiment, I suggest creating a database with 100 tables
in it, each with 1 index, and then deleting a single pg_attribute
entry for 10 of the tables, and then running pg_amcheck. I think you
will get 20 errors - one for each messed-up table and one for the
corresponding index. Maybe you'll get errors for the TOAST tables
checks too, if the tables have TOAST tables, although that seems like
it should be avoidable. Now, now matter what you do, the tool is going
to produce a lot of output here, because you have a lot of problems,
and that's OK. But how understandable is that output, and how concise
is it? If it says something like:

pg_amcheck: could not check "SCHEMA_NAME"."TABLE_NAME": ERROR: some
attributes are missing or something

...and that line is repeated 20 times, maybe with a context or detail
line for each one or something like that, then you have got a good UI.
If it's not clear which tables have the problem, you have got a bad
UI. If it dumps out 300 lines of output instead of 20 or 40, you have
a UI that is so verbose that usability is going to be somewhat
impaired, which is why I suggested only showing the query in verbose
mode.

BTW, another thing that might be interesting is to call
PQsetErrorVerbosity(conn, PQERRORS_VERBOSE) in verbose mode. It's
probably possible to contrive a case where the server error message is
something generic like "cache lookup failed for relation %u" which
occurs in a whole bunch of places in the source code, and being able
get the file and line number information can be really useful when
trying to track such things down.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2021-03-04 15:34:51 Re: Increase value of OUTER_VAR
Previous Message Isaac Morland 2021-03-04 15:21:46 Re: [PATCH] Support empty ranges with bounds information