Re: [CLOBBER_CACHE]Server crashed with segfault 11 while executing clusterdb

From: Amul Sul <sulamul(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Noah Misch <noah(at)leadboat(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Neha Sharma <neha(dot)sharma(at)enterprisedb(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [CLOBBER_CACHE]Server crashed with segfault 11 while executing clusterdb
Date: 2021-03-24 05:40:08
Message-ID: CAAJ_b94tVyuZ=HvGbznv8WNZGzT_fmOJUbFfnKskP_mF17AN_w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 23, 2021 at 8:59 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> I wrote:
> > Michael Paquier <michael(at)paquier(dot)xyz> writes:
> >> One bisect later, the winner is:
> >> commit: 3d351d916b20534f973eda760cde17d96545d4c4
> >> author: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> >> date: Sun, 30 Aug 2020 12:21:51 -0400
> >> Redefine pg_class.reltuples to be -1 before the first VACUUM or ANALYZE.
>
> > I think that's an artifact. That commit didn't touch anything related to
> > relation opening or closing. What it could have done, though, is change
> > CLUSTER's behavior on this empty table from use-an-index to use-a-seqscan,
> > thus causing us to follow the buggy code path where before we didn't.
>
> On closer inspection, I believe the true culprit is c6b92041d,
> which did this:
>
> */
> if (RelationNeedsWAL(state->rs_new_rel))
> - heap_sync(state->rs_new_rel);
> + smgrimmedsync(state->rs_new_rel->rd_smgr, MAIN_FORKNUM);
>
> logical_end_heap_rewrite(state);
>
> heap_sync was careful about opening rd_smgr, the new code not so much.
>
> I read the rest of that commit and didn't see any other equivalent
> bugs, but I might've missed something.
>

I too didn't find any other place replacing heap_sync() or equivalent place from
this commit where smgr* operation reaches without necessary precautions call.
heap_sync() was calling RelationOpenSmgr() through FlushRelationBuffers() before
it reached smgrimmedsync(). So we also need to make sure of the
RelationOpenSmgr() call before smgrimmedsync() as proposed previously.

Regards,
Amul

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2021-03-24 06:13:07 Re: pg_amcheck contrib application
Previous Message Kyotaro Horiguchi 2021-03-24 05:26:44 Re: Protect syscache from bloating with negative cache entries