Re: pg_amcheck contrib application

From: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Noah Misch <noah(at)leadboat(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, Stephen Frost <sfrost(at)snowman(dot)net>, Michael Paquier <michael(at)paquier(dot)xyz>, Amul Sul <sulamul(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_amcheck contrib application
Date: 2021-04-30 18:31:04
Message-ID: 91ED0048-C788-4540-B21F-6EBE622E9390@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Apr 30, 2021, at 9:39 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Mon, Apr 26, 2021 at 1:52 PM Mark Dilger
> <mark(dot)dilger(at)enterprisedb(dot)com> wrote:
>> The attached patch changes amcheck corruption reports as discussed upthread. This patch is submitted for the v14 development cycle as a bug fix, per your complaint that the committed code generates reports sufficiently confusing to a user as to constitute a bug.
>>
>> All other code refactoring and additional checks discussed upthread are reserved for the v15 development cycle and are not included here.
>>
>> The minimal patch (not attached) that does not rename any variables is 135 lines. Your patch was 159 lines. The patch (attached) which includes your variable renaming is 174 lines.
>
> Hi,
>
> I have compared this against my version. I found the following differences:

Just to be clear, I did not use your patch v1 as the starting point. I took the code as committed to master as the starting point, used your corruption report verbiage changes and at least some of your variable naming choices, but did not use the rest, in large part because it didn't work. It caused corruption messages to be reported against tables that have no corruption. For that matter, your v2 patch doesn't work either, and in the same way. To wit:

heap table "postgres"."pg_catalog"."pg_rewrite", block 6, offset 4, attribute 7:
toast value 13461 chunk 0 has size 1995, but expected size 1996

I think there is something wrong with the way you are trying to calculate and use extsize, because I'm not corrupting pg_catalog.pg_rewrite. You can get these same results by applying your patch to master, building, and running 'make check' from src/bin/pg_amcheck/

> 1. This version passes last_chunk_seq rather than extsize to
> check_toast_tuple(). But this results in having to call
> VARATT_EXTERNAL_GET_EXTSIZE() inside that function. I thought it was
> nicer to do that in the caller, so that we don't do it twice.

I don't see that VARATT_EXTERNAL_GET_EXTSIZE() is worth too much concern, given that is just a struct access and a bit mask. You are avoiding calculating that twice, but at the expense of calculating last_chunk_seq twice, which involves division. I don't think the division can be optimized as a mere bit shift, since TOAST_MAX_CHUNK_SIZE is not in general a power of two. (For example, on my laptop it is 1996.)

I don't say this to nitpick at the performance one way vs. the other. I doubt it makes any real difference. I'm just confused why you want to change this particular thing right now, given that it is not a bug.

> 2. You fixed some out-of-date comments.

Yes, because they were wrong. That's on me. I failed to update them in a prior patch.

> 3. You move the test for an unexpected chunk sequence further down in
> the function. I don't see the point;

Relative to your patch, perhaps. Relative to master, no tests have been moved.

> I had put it by the related null
> check, and still think that's better. You also deleted my comment /*
> Either the TOAST index is corrupt, or we don't have all chunks. */
> which I would have preferred to keep.

That's fine. I didn't mean to remove it. I was just taking a minimalist approach to constructing the patch.

> 4. You don't return if chunk_seq > last_chunk_seq. That seems wrong,
> because we cannot compute a sensible expected size in that case. I
> think your code will subtract a larger value from a smaller one and,
> this being unsigned arithmetic, say that the expected chunk size is
> something gigantic.

Your conclusion is probably right, but I think your analysis is based on a misreading of what "last_chunk_seq" means. It's not the last one seen, but the last one expected. (Should we rename the variable to avoid confusion?) It won't compute a gigantic size. Rather, it will expect *every* chunk with chunk_seq >= last_chunk_seq to have whatever size is appropriate for the last chunk.

> Returning and not issuing that complaint at all
> seems better.

That might be best. I had been resisting that because I don't want the extraneous chunks to be reported without chunk size information. When debugging corrupted toast, it may be interesting to know the size of the extraneous chunks. If there are 1000 extra chunks, somebody might want to see the sizes of them.

> 5. You fixed the incorrect formula I had introduced for the expected
> size of the last chunk.

Not really. I just didn't introduce any change in that area.

> 6. You changed the variable name in check_toasted_attribute() from
> expected_chunkno to chunkno, and initialized it later in the function
> instead of at declaration time. I don't find this to be an
> improvement;

I think I just left the variable name and its initialization unchanged.

> including the word "expected" seems to me to be
> substantially clearer. But I think I should have gone with
> expected_chunk_seq for better consistency.

I agree that is a better name.

> 7. You restored the message "toast value %u was expected to end at
> chunk %d, but ended at chunk %d" which my version deleted. I deleted
> that message because I thought it was redundant, but I guess it's not:
> there's nothing else to complain if the sequence of chunks ends early.
> I think we should change the test from != to < though, because if it's
>> , then we must have already complained about unexpected chunks.

We can do it that way if you like. I considered that and had trouble deciding if that made things less clear to users who might be less familiar with the structure of toasted attributes. If some of the attributes have that message and others don't, they might conclude that only some of the attributes ended at the wrong chunk and fail to make the inference that to you or me is obvious.

>> Also,
> I think the message is actually wrong, because even though you renamed
> the variable, it still ends up being the expected next chunkno rather
> than the last chunkno we actually saw.

If we have seen any chunks, the variable is holding the expected next chunk seq, which is one greater than the last chunk seq we saw.

If we expect chunks 0..3 and see chunk 0 but not chunk 1, it will complain ..."expected to end at chunk 4, but ended at chunk 1". This is clearly by design and not merely a bug, though I tend to agree with you that this is a strange wording choice. I can't remember exactly when and how we decided to word the message this way, but it has annoyed me for a while, and I assumed it was something you suggested a while back, because I don't recall doing it. Either way, since you seem to also be bothered by this, I agree we should change it.

> PFA my counter-proposal based on the above analysis.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2021-04-30 18:33:48 Re: pg_upgrade test for binary compatibility of core data types
Previous Message Jeff Davis 2021-04-30 18:28:31 Re: MaxOffsetNumber for Table AMs