From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, Stephen Frost <sfrost(at)snowman(dot)net>, Michael Paquier <michael(at)paquier(dot)xyz>, Amul Sul <sulamul(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: pg_amcheck contrib application |
Date: | 2021-03-04 22:04:37 |
Message-ID: | CAH2-WznsRybLrkJY2E++oXmt531p45jpMiigwDSc4y2A6f9C4g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Mar 4, 2021 at 7:29 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I think this whole approach is pretty suspect because the number of
> blocks in the relation can increase (by relation extension) or
> decrease (by VACUUM or TRUNCATE) between the time when we query for
> the list of target relations and the time we get around to executing
> any queries against them. I think it's OK to use the number of
> relation pages for progress reporting because progress reporting is
> only approximate anyway, but I wouldn't print them out in the progress
> messages, and I wouldn't try to fix up the startblock and endblock
> arguments on the basis of how long you think that relation is going to
> be.
I don't think that the struct AmcheckOptions block fields (e.g.,
startblock) should be of type 'long' -- that doesn't work well on
Windows, where 'long' is only 32-bit. To be fair we already do the
same thing elsewhere, but there is no reason to repeat those mistakes.
(I'm rather suspicious of 'long' in general.)
I think that you could use BlockNumber + strtoul() without breaking Windows.
> There are a LOT of things that can go wrong when we go try to run
> verify_heapam on a table. The table might have been dropped; in fact,
> on a busy production system, such cases are likely to occur routinely
> if DDL is common, which for many users it is. The system catalog
> entries might be screwed up, so that the relation can't be opened.
> There might be an unreadable page in the relation, either because the
> OS reports an I/O error or something like that, or because checksum
> verification fails. There are various other possibilities. We
> shouldn't view such errors as low-level things that occur only in
> fringe cases; this is a corruption-checking tool, and we should expect
> that running it against messed-up databases will be common. We
> shouldn't try to interpret the errors we get or make any big decisions
> about them, but we should have a clear way of reporting them so that
> the user can decide what to do.
I agree.
Your database is not supposed to be corrupt. Once your database has
become corrupt, all bets are off -- something happened that was
supposed to be impossible -- which seems like a good reason to be
modest about what we think we know.
The user should always see the unvarnished truth. pg_amcheck should
not presume to suppress errors from lower level code, except perhaps
in well-scoped special cases.
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Munro | 2021-03-04 22:08:22 | Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier? |
Previous Message | Thomas Munro | 2021-03-04 22:02:01 | Make relfile tombstone files conditional on WAL level |