Re: pg_amcheck contrib application

From: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Noah Misch <noah(at)leadboat(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, Stephen Frost <sfrost(at)snowman(dot)net>, Michael Paquier <michael(at)paquier(dot)xyz>, Amul Sul <sulamul(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_amcheck contrib application
Date: 2021-04-13 03:06:09
Message-ID: 7D3B9BF6-50D0-4C30-8506-1C1851C7F96F@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Apr 9, 2021, at 1:51 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Fri, Apr 9, 2021 at 2:50 PM Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com> wrote:
>> I think #4, above, requires some clarification. If there are missing chunks, the very definition of how large we expect subsequent chunks to be is ill-defined. I took a fairly conservative approach to avoid lots of bogus complaints about chunks that are of unexpected size. Not all such complaints are removed, but enough are removed that I needed to add a final complaint at the end about the total size seen not matching the total size expected.
>
> My instinct is to suppose that the size that we expect for future
> chunks is independent of anything being wrong with previous chunks. So
> if each chunk is supposed to be 2004 bytes (which probably isn't the
> real number) and the value is 7000 bytes long, we expect chunks 0-2 to
> be 2004 bytes each, chunk 3 to be 988 bytes, and chunk 4 and higher to
> not exist. If chunk 1 happens to be missing or the wrong length or
> whatever, our expectations for chunks 2 and 3 are utterly unchanged.

Fair enough.

>> Corruption #1:
>>
>> UPDATE $toastname SET chunk_seq = chunk_seq + 1000
>>
>> Before:
>>
>> # heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
>> # toast value 16445 chunk 0 has sequence number 1000, but expected sequence number 0
>> # heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
>> # toast value 16445 chunk 1 has sequence number 1001, but expected sequence number 1
>> # heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
>> # toast value 16445 chunk 2 has sequence number 1002, but expected sequence number 2
>> # heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
>> # toast value 16445 chunk 3 has sequence number 1003, but expected sequence number 3
>> # heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
>> # toast value 16445 chunk 4 has sequence number 1004, but expected sequence number 4
>> # heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
>> # toast value 16445 chunk 5 has sequence number 1005, but expected sequence number 5
>>
>> After:
>>
>> # heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
>> # toast value 16445 missing chunks 0 through 999
>
> Applying the above principle would lead to complaints that chunks 0-5
> are missing, and 1000-1005 are extra.

That sounds right. It now reports:

# heap table "postgres"."public"."test", block 0, offset 16, attribute 2:
# toast value 16459 missing chunks 0 through 4 with expected size 1996 and chunk 5 with expected size 20
# heap table "postgres"."public"."test", block 0, offset 16, attribute 2:
# toast value 16459 unexpected chunks 1000 through 1004 each with size 1996 followed by chunk 1005 with size 20

>> Corruption #2:
>>
>> UPDATE $toastname SET chunk_seq = chunk_seq * 1000
>
> Similarly here, except the extra chunk numbers are different.

It now reports:

# heap table "postgres"."public"."test", block 0, offset 17, attribute 2:
# toast value 16460 missing chunks 1 through 4 with expected size 1996 and chunk 5 with expected size 20
# heap table "postgres"."public"."test", block 0, offset 17, attribute 2:
# toast value 16460 unexpected chunk 1000 with size 1996
# heap table "postgres"."public"."test", block 0, offset 17, attribute 2:
# toast value 16460 unexpected chunk 2000 with size 1996
# heap table "postgres"."public"."test", block 0, offset 17, attribute 2:
# toast value 16460 unexpected chunk 3000 with size 1996
# heap table "postgres"."public"."test", block 0, offset 17, attribute 2:
# toast value 16460 unexpected chunk 4000 with size 1996
# heap table "postgres"."public"."test", block 0, offset 17, attribute 2:
# toast value 16460 unexpected chunk 5000 with size 20

I don't see any good way in this case to report the extra chunks in one row, as in the general case there could be arbitrarily many of them, with the message text getting arbitrarily large if it reported the chunks as "chunks 1000, 2000, 3000, 4000, 5000, ...". I don't expect this sort of corruption being particularly common, though, so I'm not too bothered about it.

>
>> Corruption #3:
>>
>> UPDATE $toastname SET chunk_id = (chunk_id::integer + 10000000)::oid WHERE chunk_seq = 3
>
> And here we'd just get a complaint that chunk 3 is missing.

It now reports:

# heap table "postgres"."public"."test", block 0, offset 18, attribute 2:
# toast value 16461 missing chunk 3 with expected size 1996
# heap table "postgres"."public"."test", block 0, offset 18, attribute 2:
# toast value 16461 was expected to end at chunk 6 with total size 10000, but ended at chunk 5 with total size 8004

It sounds like you weren't expecting the second of these reports. I think it is valuable, especially when there are multiple missing chunks and multiple extraneous chunks, as it makes it easier for the user to reconcile the missing chunks against the extraneous chunks.

Attachment Content-Type Size
v20-0001-amcheck-rewording-messages-and-fixing-alignment.patch application/octet-stream 8.1 KB
v20-0002-amcheck-adding-toast-pointer-corruption-checks.patch application/octet-stream 20.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2021-04-13 03:06:51 Re: [PATCH] Identify LWLocks in tracepoints
Previous Message Peter Smith 2021-04-13 02:56:23 Re: [HACKERS] logical decoding of two-phase transactions