Re: [CLOBBER_CACHE]Server crashed with segfault 11 while executing clusterdb

From: Neha Sharma <neha(dot)sharma(at)enterprisedb(dot)com>
To: Amul Sul <sulamul(at)gmail(dot)com>
Cc: Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [CLOBBER_CACHE]Server crashed with segfault 11 while executing clusterdb
Date: 2021-03-23 05:22:09
Message-ID: CANiYTQtb1WJ+ZyHdJr_FJDSDZDh89VkGLcbRBh4P4RrnnBDejg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 23, 2021 at 10:08 AM Amul Sul <sulamul(at)gmail(dot)com> wrote:

> On Mon, Mar 22, 2021 at 3:03 PM Amit Langote <amitlangote09(at)gmail(dot)com>
> wrote:
> >
> > On Mon, Mar 22, 2021 at 5:26 PM Amul Sul <sulamul(at)gmail(dot)com> wrote:
> > > In heapam_relation_copy_for_cluster(), begin_heap_rewrite() sets
> > > rwstate->rs_new_rel->rd_smgr correctly but next line
> tuplesort_begin_cluster()
> > > get called which cause the system cache invalidation and due to CCA
> setting,
> > > wipe out rwstate->rs_new_rel->rd_smgr which wasn't restored for the
> subsequent
> > > operations and causes segmentation fault.
> > >
> > > By calling RelationOpenSmgr() before calling smgrimmedsync() in
> > > end_heap_rewrite() would fix the failure. Did the same in the attached
> patch.
> >
> > That makes sense. I see a few commits in the git history adding
> > RelationOpenSmgr() before a smgr* operation, whenever such a problem
> > would have been discovered: 4942ee656ac, afa8f1971ae, bf347c60bdd7,
> > for example.
> >
>
> Thanks for the confirmation.
>
> > I do wonder if there are still other smgr* operations in the source
> > code that are preceded by operations that would invalidate the
> > SMgrRelation that those smgr* operations would be called with. For
> > example, the smgrnblocks() in gistBuildCallback() may get done too
> > late than a corresponding RelationOpenSmgr() on the index relation.
> >
>
> I did the check for gistBuildCallback() by adding Assert(index->rd_smgr)
> before
> smgrnblocks() with CCA setting and didn't see any problem there.
>
> I think the easiest way to find that is to run a regression suite with CCA
> build, perhaps, there is no guarantee that regression will hit all smgr*
> operations, but that might hit most of them.

Sure, will give a regression run with CCA enabled.

>
> Regards,
> Amul
>

Regards,
Neha Sharma

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-03-23 05:22:46 Re: replication cleanup code incorrect way to use of HTAB HASH_REMOVE ?
Previous Message Michael Paquier 2021-03-23 05:21:53 Re: Proposal: Save user's original authenticated identity for logging