Re: why do hash index builds use smgrextend() for new splitpoint pages

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: why do hash index builds use smgrextend() for new splitpoint pages
Date: 2022-02-28 05:59:32
Message-ID: CAA4eK1KbPo8+XtJf1Cc6rtLfwHYribSvCW=WRwggmeG8SP3c3w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Feb 26, 2022 at 9:17 PM Melanie Plageman
<melanieplageman(at)gmail(dot)com> wrote:
>
> On Fri, Feb 25, 2022 at 11:17 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Sat, Feb 26, 2022 at 3:01 AM Melanie Plageman
> > <melanieplageman(at)gmail(dot)com> wrote:
> > >
> > > Since _hash_alloc_buckets() WAL-logs the last page of the
> > > splitpoint, is it safe to skip the smgrimmedsync()? What if the last
> > > page of the splitpoint doesn't end up having any tuples added to it
> > > during the index build and the redo pointer is moved past the WAL for
> > > this page and then later there is a crash sometime before this page
> > > makes it to permanent storage. Does it matter that this page is lost? If
> > > not, then why bother WAL-logging it?
> > >
> >
> > I think we don't care if the page is lost before we update the
> > meta-page in the caller because we will try to reallocate in that
> > case. But we do care after meta page update (having the updated
> > information about this extension via different masks) in which case we
> > won't lose this last page because it would have registered the sync
> > request for it via sgmrextend before meta page update.
>
> and could it happen that during smgrextend() for the last page, a
> checkpoint starts and finishes between FileWrite() and
> register_dirty_segment(), then index build finishes, and then a crash
> occurs before another checkpoint completes the pending fsync for that
> last page?
>

Yeah, this seems to be possible and then the problem could be that
index's idea and smgr's idea for EOF could be different which could
lead to a problem when we try to get a new page via _hash_getnewbuf().
If this theory turns out to be true then probably, we can get an error
either because of disk full or the index might request a block that is
beyond EOF as determined by RelationGetNumberOfBlocksInFork() in
_hash_getnewbuf().

Can we try to reproduce this scenario with the help of a debugger to
see if we are missing something?

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Yura Sokolov 2022-02-28 06:01:49 Re: BufferAlloc: don't take two simultaneous locks
Previous Message Bharath Rupireddy 2022-02-28 05:04:25 Re: Synchronizing slots from primary to standby