Re: pg_amcheck contrib application

From: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, Stephen Frost <sfrost(at)snowman(dot)net>, Michael Paquier <michael(at)paquier(dot)xyz>, Amul Sul <sulamul(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_amcheck contrib application
Date: 2021-03-16 16:51:04
Message-ID: 16B79A18-48F3-404D-8DC5-53528E85A7AC@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Mar 16, 2021, at 9:30 AM, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com> wrote:
>
>
>
>> On Mar 16, 2021, at 9:07 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>
>> Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com> writes:
>>> I think autovacuum simply triggers the bug, and is not the cause of the bug. If I turn autovacuum off and instead do an ANALYZE in each test database rather than performing the corruptions, I get reports about problems in pg_statistic. This is on my mac laptop. This rules out the theory that autovacuum is propogating corruptions into pg_statistic, and also the theory that it is architecture dependent.
>>
>> I wonder whether amcheck is confused by the declaration of those columns
>> as "anyarray".
>
> It uses attlen and attalign for the attribute, so that idea does make sense. It gets that via TupleDescAttr(RelationGetDescr(rel), attnum).

Yeah, that looks related:

regression=# select attname, attlen, attnum, attalign from pg_attribute where attrelid = 2619 and attname like 'stavalue%';
attname | attlen | attnum | attalign
------------+--------+--------+----------
stavalues1 | -1 | 27 | d
stavalues2 | -1 | 28 | d
stavalues3 | -1 | 29 | d
stavalues4 | -1 | 30 | d
stavalues5 | -1 | 31 | d
(5 rows)

It shows them all has having attalign = 'd', but for some array types the alignment will be 'i', not 'd'. So it's lying a bit about the contents. But I'm now confused why this caused problems on the two hosts where integer and double have the same alignment? It seems like that would be the one place where the bug would not happen, not the one place where it does.

I'm attaching a test that reliably reproduces this for me:

Attachment Content-Type Size
v8-0005-Adding-pg_amcheck-special-test-for-pg_statistic.patch application/octet-stream 1.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2021-03-16 16:56:40 Re: pg_amcheck contrib application
Previous Message Avinash Kumar 2021-03-16 16:50:10 Re: Postgres crashes at memcopy() after upgrade to PG 13.