Re: error context for vacuum to include block number

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: error context for vacuum to include block number
Date: 2020-03-27 06:20:30
Message-ID: CAA4eK1LKYnDV8bcx59TdR5-s2Cfk75g_wfAjRyy-cL8S+WvHXw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 27, 2020 at 11:46 AM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
>
> On Thu, Mar 26, 2020 at 11:44:24PM -0500, Justin Pryzby wrote:
> > On Fri, Mar 27, 2020 at 09:49:29AM +0530, Amit Kapila wrote:
> > > On Fri, Mar 27, 2020 at 3:47 AM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> > > >
> > > > > Hm, I was just wondering what happens if an error happens *during*
> > > > > update_vacuum_error_cbarg(). It seems like if we set
> > > > > errcbarg->phase=VACUUM_INDEX before setting errcbarg->indname=indname, then an
> > > > > error would cause a crash.
> > > > >
> > >
> > > Can't that be avoided if you check if cbarg->indname is non-null in
> > > vacuum_error_callback as we are already doing for
> > > VACUUM_ERRCB_PHASE_TRUNCATE?
> > >
> > > > > And if we pfree and set indname before phase, it'd
> > > > > be a problem when going from an index phase to non-index phase.
> > >
> > > How is it possible that we move to the non-index phase without
> > > clearing indname as we always revert back the old phase information?
> >
> > The crash scenario I'm trying to avoid would be like statement_timeout or other
> > asynchronous event occurring between two non-atomic operations.
> >
> > I said that there's an issue no matter what order we set indname/phase;
> > If we wrote:
> > |cbarg->indname = indname;
> > |cbarg->phase = phase;
> > ..and hit a timeout (or similar) between setting indname=NULL but before
> > setting phase=VACUUM_INDEX, then we can crash due to null pointer.
> >
> > But if we write:
> > |cbarg->phase = phase;
> > |if (cbarg->indname) {pfree(cbarg->indname);}
> > |cbarg->indname = indname ? pstrdup(indname) : NULL;
> > ..then we can still crash if we timeout between freeing cbarg->indname and
> > setting it to null, due to acccessing a pfreed allocation.
>
> If "phase" is updated before "indname", I'm able to induce a synthetic crash
> like this:
>
> +if (errinfo->phase==VACUUM_ERRCB_PHASE_VACUUM_INDEX && errinfo->indname==NULL)
> +{
> +kill(getpid(), SIGINT);
> +pg_sleep(1); // that's needed since signals are delivered asynchronously
> +}
>
> And another crash if we do this after pfree but before setting indname.
>
> +if (errinfo->phase==VACUUM_ERRCB_PHASE_VACUUM_INDEX && errinfo->indname!=NULL)
> +{
> +kill(getpid(), SIGINT);
> +pg_sleep(1);
> +}
>
> I'm not sure if those are possible outside of "induced" errors. Maybe the
> function is essentially atomic due to no CHECK_FOR_INTERRUPTS or similar?
>

Yes, this is exactly the point. I think unless you have
CHECK_FOR_INTERRUPTS in that function, the problems you are trying to
think won't happen.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2020-03-27 06:29:27 Re: backup manifests
Previous Message Justin Pryzby 2020-03-27 06:16:01 Re: error context for vacuum to include block number