Re: new heapcheck contrib module

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Geoghegan <pg(at)bowt(dot)ie>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, Stephen Frost <sfrost(at)snowman(dot)net>, Michael Paquier <michael(at)paquier(dot)xyz>, Amul Sul <sulamul(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: new heapcheck contrib module
Date: 2021-02-04 21:04:10
Message-ID: CA+Tgmobp_aurqbXAUM_Da1Srwm6bc0KoYGhqc=0VrCVw9zfZPQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 4, 2021 at 11:10 AM Mark Dilger
<mark(dot)dilger(at)enterprisedb(dot)com> wrote:
> I also made changes to clean up 0003 (formerly numbered 0004)

"deduplice" is a typo.

I'm not sure that I agree with check_each_database()'s commentary
about why it doesn't make sense to optimize the resolve-the-databases
step. Like, suppose I type 'pg_amcheck sasquatch'. I think the way you
have it coded it's going to tell me that there are no databases to
check, which might make me think I used the wrong syntax or something.
I want it to tell me that sasquatch does not exist. If I happen to be
a cryptid believer, I may reject that explanation as inaccurate, but
at least there's no question about what pg_amcheck thinks the problem
is.

Why does check_each_database() go out of its way to run the main query
without the always-secure search path? If there's a good reason, I
think it deserves a comment saying what the reason is. If there's not
a good reason, then I think it should use the always-secure search
path for 100% of everything. Same question applies to
check_one_database().

ParallelSlotSetHandler(free_slot, VerifyHeapamSlotHandler, sql.data)
could stand to be split over two lines, like you do for the nearly
run_command() call, so that it doesn't go past 80 columns.

I suggest having two variables instead of one for amcheck_schema.
Using the same variable to store the unescaped value and then later
the escaped value is, IMHO, confusing. Whatever you call the escaped
version, I'd rename the function parameters elsewhere to match.

"status = PQsendQuery(conn, sql) == 1" seems a bit uptight to me. Why
not just make status an int and then just "status = PQsendQuery(conn,
sql)" and then test for status != 0? I don't really care if you don't
change this, it's not actually important. But personally I'd rather
code it as if any non-zero value meant success.

I think the pg_log_error() in run_command() could be worded a bit
better. I don't think it's a good idea to try to include the type of
object in there like this, because of the translatability guidelines
around assembling messages from fragments. And I don't think it's good
to say that the check failed because the reality is that we weren't
able to ask for the check to be run in the first place. I would rather
log this as something like "unable to send query: %s". I would also
assume we need to bail out entirely if that happens. I'm not totally
sure what sorts of things can make PQsendQuery() fail but I bet it
boils down to having lost the server connection. Should that occur,
trying to send queries for all of the remaining objects is going to
result in repeating the same error many times, which isn't going to be
what anybody wants. It's unclear to me whether we should give up on
the whole operation but I think we have to at least give up on that
connection... unless I'm confused about what the failure mode is
likely to be here.

It looks to me like the user won't be able to tell by the exit code
what happened. What I did with pg_verifybackup, and what I suggest we
do here, is exit(1) if anything went wrong, either in terms of failing
to execute queries or in terms of those queries returning problem
reports. With pg_verifybackup, I thought about trying to make it like
0 => backup OK, 2 => backup not OK, 2 => trouble, but I found it too
hard to distinguish what should be exit(1) and what should be exit(2)
and the coding wasn't trivial either, so I went with the simpler
scheme.

The opening line of appendDatabaseSelect() could be adjusted to put
the regexps parameter on the next line, avoiding awkward wrapping.

If they are being run with a safe search path, the queries in
appendDatabaseSelect(), appendSchemaSelect(), etc. could be run
without all the paranoia. If not, maybe they should be. The casts to
text don't include the paranoia: with an unsafe search path, we need
pg_catalog.text here. Or no cast at all, which seems like it ought to
be fine too. Not quite sure why you are doing all that casting to
text; the datatype is presumably 'name' and ought to collate like
collate "C" which is probably fine.

It would probably be a better idea for appendSchemaSelect to declare a
PQExpBuffer and call initPQExpBuffer just once, and then
resetPQExpBuffer after each use, and finally termPQExpBuffer just
once. The way you have it is not expensive enough to really matter,
but avoiding repeated allocate/free cycles is probably best.

I wonder if a pattern like .foo.bar ends up meaning the same thing as
a pattern like foo.bar, with the empty database name being treated the
same as if nothing were specified.

From the way appendTableCTE() is coded, it seems to me that if I ask
for tables named j* excluding tables named jam* I still might get
toast tables for my jam, which seems wrong.

There does not seem to be any clear benefit to defining CT_TABLE = 0
in this case, so I would let the compiler deal with it. We should not
be depending on that to have any particular numeric value.

Why does pg_amcheck.c have a header file pg_amcheck.h if there's only
one source file? If you had multiple source files then the header
would be a reasonable place to put stuff they all need, but you don't.

Copying the definitions of HEAP_TABLE_AM_OID and BTREE_AM_OID into
pg_amcheck.h or anywhere else seems bad. I think you just be doing
#include "catalog/pg_am_d.h".

I think I'm out of steam for today but I'll try to look at this more
soon. In general I think this patch and the whole series are pretty
close to being ready to commit, even though there are still things I
think need fixing here and there.

Thanks,

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2021-02-04 21:48:26 Re: fdatasync(2) on macOS
Previous Message Andres Freund 2021-02-04 19:54:59 Re: logical replication worker accesses catalogs in error context callback