Re: BUG #17212: pg_amcheck fails on checking temporary relations

From: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BUG #17212: pg_amcheck fails on checking temporary relations
Date: 2021-10-06 22:47:27
Message-ID: 55A2BFE7-AAAF-4A64-BD3A-5C98EA982267@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

> On Oct 6, 2021, at 3:20 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
>> I think the disagreements are about something else.
>
> Informally speaking, you could say that pg_amcheck and amcheck verify
> relations. More formally speaking, both amcheck (whether called by
> pg_amcheck or some other thing) can only prove the presence of
> corruption. They cannot prove its absence. (The amcheck docs have
> always said almost these exact words.)

I totally agree that the amcheck functions cannot prove the absence of corruption.

I prefer not to even use language about proving the presence of corruption when discussing pg_amcheck. I have let that slide upthread as a convenient short-hand, but I think it doesn't help. For pg_amcheck to take any view whatsoever on whether a btree index is corrupt, it would have to introspect the error message that it gets back from bt_index_check(). It doesn't do that, nor do I think that it should. It just prints the contents of the error for the user and records that fact and eventually exits with a non-zero exit code. The error might have been something about the command exiting due to the crash of another backend, or to do with a deadlock against some other process, or whatever, and pg_amcheck has no opinion about whether any of that is to do with corruption or not.

> This seems to come up a lot because at various points you seem to be
> concerned about introducing specific imperfections. But it's not like
> your starting point was ever perfection, or ever could be.

From the point of view of detecting corruptions, I agree that it never could be. But I'm not talking about that. I'm talking about whether pg_amcheck issues all the commands that it is supposed to issue. If I work for Daddy Warbucks and he gives me 30 classic cars to take to 10 different mechanics, I can do that job perfectly even if the mechanics do less than perfect work. If I leave three cars in the driveway, that's on me. Likewise, it's not on pg_amcheck if the checking functions can't do perfect work, but it is on pg_amcheck if it doesn't issue all the expected commands. But later on in this email, it appears we don't have any remaining disagreements about that. Read on....

> I can
> always describe a scenario in which amcheck misses real corruption --
> a scenario which may be very contrived. So the mere fact that some new
> theoretical possibility of corruption is introduced by some action
> does not in itself mean much. We're dealing with that constantly, and
> always will be.

I wish we could stop discussing this. I really don't think this ticket has anything to do with how well or how poorly or how completely the amcheck functions work.

> Let's suppose I was to "directly fix amcheck + !indisvalid indexes". I
> don't even know what that means -- I honestly don't have a clue.
> You're focussing on one small piece of code in verify_nbtree.c, that
> seems to punt responsibility, but the fact is that there are deeply
> baked-in reasons why it does so. That's a reflection of how many
> things about the system work, in general. Attributing blame to any one
> small snippet of code (code in verify_nbtree.c, or wherever) just
> isn't helpful.

I think we have agreed that pg_amcheck can filter out invalid indexes. I don't have a problem with that. I admit that I did have a problem with that upthread, but its been a while since I conceded that point so I'd rather not have to argue it again.

>> In truth, all the pg_amcheck frontend client can take a view on is whether it was able to issue all the commands to the backend that it was asked to issue, and whether any of those commands responded with an error.
>
> AFAICT that pg_amcheck has to do is follow the amcheck user docs, by
> generalizing from the example SQL query for the B-Tree stuff. And, it
> should separately filter non-temp relations for the heap stuff, for
> the same reasons (exactly the same situation there).

I think we have agreed on that one, too, without me having ever argued it. I posted a patch to filter out the temporary tables already.

>> pg_amcheck -d db1 -d db2 -d db3 --table=mytable
>>
>> In this case, mytable is a regular table on db1, a temporary table on db2, and an unlogged table on db3, and db3 is in recovery.
>
> I don't think that pg_amcheck needs to care about being in recovery,
> at all. I agreed with you about using pg_is_in_recovery() from at one
> point. That was a mistake on my part.

Ok, excellent, that was probably the only thing that had me really hung up. I thought you were still asking for pg_amcheck to filter out the --parent-check option when in recovery, but if you're not asking for that, then I might have enough to go on now.

>> I thought that we were headed toward a decision where (despite my discomfort) pg_amcheck would downgrade options as necessary, but now it sounds like that's not so. So what should it do
>
> Downgrade is how you refer to it. I just think of it as making sure
> that pg_amcheck only asks amcheck to verify relations that are
> basically capable of being verified (e.g., not a temp table).

I was using "downgrading" to mean downgrading from bt_index_parent_check() to bt_index_check() when pg_is_in_recovery() is true, but you've clarified that you're not requesting that downgrade, so I think we've now gotten past the last sticking point about that whole issue.

There are other sticking points that don't seem to be things you have taken a view on. Specifically, pg_amcheck complains if a relation pattern doesn't match anything, so that

pg_amcheck --table="*acountng*"

will complain if no tables match, giving the user the opportunity to notice that they spelled "accounting" wrong. If there happens to be a table named "xyzacountngo", and that matches, too bad. There isn't any way pg_amcheck can be responsible for that. But if there is a temporary table named "xyzacountngo" and that gets skipped because it's a temp table, I don't know what feedback the user should get. That's a thorny user interfaces question, not a corruption checking question, and I don't think you need to weigh in unless you want to. I'll most likely go with whatever is the simplest to code and/or most similar to what is currently in the tree, because I don't see any knock-down arguments one way or the other.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Peter Geoghegan 2021-10-06 23:12:15 Re: BUG #17212: pg_amcheck fails on checking temporary relations
Previous Message Peter Geoghegan 2021-10-06 22:20:14 Re: BUG #17212: pg_amcheck fails on checking temporary relations

Browse pgsql-hackers by date

  From Date Subject
Next Message Bossart, Nathan 2021-10-06 22:55:49 Re: ALTER INDEX .. RENAME allows to rename tables/views as well
Previous Message Tom Lane 2021-10-06 22:43:25 Re: ALTER INDEX .. RENAME allows to rename tables/views as well